gfortran how do I increment random_seed by 2^128 - fortran

The gfortran page on random_seed says that when using OMP threads, each thread increments its seed by 2^128. I am wondering how I increment the seed by 2^128 manually. I wrote a little test program to set the master seed at all 0, and then see what the seeds were, but I don't understand what I'm seeing. What I'd like to know is for example what I put in the subroutine increment_by_2_tothe_128
program main
implicit none
character(len=32) :: arg
integer :: n
integer :: i
integer :: nthreads
integer, allocatable :: seed(:, :)
integer, allocatable :: master_seed(:)
real, allocatable :: rn(:)
call get_command_argument(1, arg)
read(arg, *) nthreads
call random_seed(size=n)
allocate(seed(n, nthreads))
allocate(master_seed(n))
allocate(rn(nthreads))
master_seed = 0
seed = 0
call random_seed(put=master_seed)
! call increment_by_2_tothe_128(n)
call omp_set_num_threads(nthreads)
!$OMP PARALLEL DO
do i=1,nthreads
call random_number(rn(i))
call random_seed(get=seed(:,i))
end do
do i=1,nthreads
print *, i
print *, rn(i)
print *, seed(:,i)
end do
end program main
subroutine increment_by_2_tothe_128(n)
implicit none
integer, intent(in) :: n
integer :: current_seed(n)
integer :: increment_seed(n)
call random_seed(get=current_seed)
! what goes here:
! incrememt_seed = current_seed + 2**128
call random_seed(put=increment_seed)
end subroutine increment_by_2_tothe_128

You cannot do that manually. You need the access to the random number generator to be able to do that, but the internals are not exposed to Fortran programmers. And you obviously cannot call the generator 2^128 times.
If you need to do the shift, you need to use some pseud-random number generator that does expose the internals and at the same time allows this kind of shift. That can be, for example, the xoroshiro PRNG family that is used internally by gfortran. These generators have a specialized function for this shift:
All generators, being based on linear recurrences, provide jump
functions that make it possible to simulate any number of calls to the
next-state function in constant time, once a suitable jump polynomial
has been computed. We provide ready-made jump functions for a number
of calls equal to the square root of the period, to make it easy
generating non-overlapping sequences for parallel computations, and
equal to the cube of the fourth root of the period, to make it
possible to generate independent sequences on different parallel
processors.
These generators are most often implemented in C, but Fortran implementations also exist (subroutine rng_jump is the jump function, disclaimer: the link goes to my repository, no guarantees for the quality).

Related

Random permutation and random seed

I am employing the Knuth algorithm to generate a random permutation
of an n-tuple. This is the code. Fixed n, it generates random permutations and collect all the different ones until it finds all the n! permutations. At the end it prints also the number of trials needed to find all the permutations. I have also inserted the initialization of the seed from the time (in a very simple and naive way, though). There are two options (A and B). A: The seed is fixed once for all in the main program. B: The seed is fixed every time a random permutation is computed (below the second option is commented).
implicit none
integer :: n,ncomb
integer :: i,h,k,x
integer, allocatable :: list(:),collect(:,:)
logical :: found
integer :: trials
!
! A
!
integer :: z,values(1:8)
integer, dimension(:), allocatable :: seed
call date_and_time(values=values)
call random_seed(size=z)
allocate(seed(1:z))
seed(:) = values(8)
call random_seed(put=seed)
n=4
ncomb=product((/(i,i=1,n)/))
allocate(list(n))
allocate(collect(n,ncomb))
trials=0
h=0
do
trials=trials+1
list=Shuffle(n)
found=.false.
do k=1,h
x=sum(abs(list-collect(:,k)))
if ( x == 0 ) then
found=.true.
exit
end if
end do
if ( .not. found ) then
h=h+1
collect(:,h)=list
print*,h,')',collect(:,h)
end if
if ( h == ncomb ) exit
end do
write(*,*) "Trials= ",trials
contains
function Shuffle(n) result(list)
integer, allocatable :: list(:)
integer, intent(in) :: n
integer :: i, randpos, temp,h
real :: r
!
! B
!
! integer :: z,values(1:8)
! integer, dimension(:), allocatable :: seed
! call date_and_time(values=values)
! call random_seed(size=z)
! allocate(seed(1:z))
! seed(:) = values(8)
! call random_seed(put=seed)
allocate(list(n))
list = (/ (h, h=1,n) /)
do i = n, 2, -1
call random_number(r)
randpos = int(r * i) + 1
temp = list(randpos)
list(randpos) = list(i)
list(i) = temp
end do
end function Shuffle
end
You can check that the second option is not good at all. For n=4 it takes around 100 times more trials to obtain the total number of permutations and for n=5 it gets stuck.
My questions are:
Why does calling random_seed multiple times give wrong results? What kind of systematic error I am introducing? Isn't it equivalent to calling random seed only once but launching the code several times (each time generating only one random permutation)?
If I want to launch several times the code, computing a single permutation, I guess that if I initialize the random seed I have the same problem (regardless the position of the initialization, since now I am computing only one permutation). Correct? In this case, what I have to do in order to initialize the seed wihouth spoiling the uniform sampling? Because if I do not initialize the seed in a random way I obtain the same permutation. I guess I could print and read the seed everytime I launch the code, in order to not to start from the same pseudo-random numbers. However, this is complicated to do if I launch several instances of the code in parallel.
UPDATE
I have understood the reply. In conclusion, if I want to generate pseudorandom numbers at each call by initializing the seed, what I can do is:
A) Old gfortran
Use the subroutine init_random_seed() here
https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gfortran/RANDOM_005fSEED.html
B) Most recent gfortran versions
call random_seed()
C) Fortran2018
call random_init(repeatable, image_distinct)
Questions
In the C) case, should I set repeatable=.false., image_distinct=.true.
to have a different random number each time?
What could be an efficient way to write the code in a portable way, so that
it works whatever the compiler is? (I mean, the code recognizes what is available and works accordingly)
You certainly should not ever call random_seed() repeatedly. It is supposed to be called just once. You make the problem worse by setting it in such a crude way.
Yes, one does often use the data and time to initialize it, but one must pre-process the time data through something that adds some entropy, like through some very simple random-generator. A good example can be found in the documentation of RANDOM_SEED for older versions of gfortran: https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gfortran/RANDOM_005fSEED.html See how lcg() is used there to transform the data_and_time() data.
Note that more recent versions of gfortran will generate a random seed that is different every time just by calling random_seed() without any arguments. Older versions returned the same seed every time.
Also note that Fortran 2018 has random_init() where you can specify repeatable= to be true or false. With false you get a different sequence every time.
The portable thing is to use standard Fortran, that's all. But you cannot use new features and old compiler versions at the same time. This kind of portability does not exist. With old compilers you can only use old standard features. I won't even start writing about autoconf and stuff, it is not worth it.
So,
you can set your random number seed to be the same every time or distinct every time (see above),
and
you should always call random_seed or random_init only once.
Why does calling random_seed multiple times give wrong results?
You are re-starting the pseudorandom sequence to some unspecified state, probably with insufficient entropy. Quite easily quite close to the last starting state.
What kind of systematic error I am introducing? Isn't it equivalent to calling random seed only once but launching the code several times (each time generating only one random permutation)?
It might be similar. But your seeding using the time is way too naïve and when running in the loop the date and time is way too similar if not completely equal in most bits. Some transform as linked above might mask that problem anyway but putting the date and time itself as your seed is just not going to work.

Right way to communicate portable Fortran data types with MPI [duplicate]

I have a Fortran program where I specify the kind of the numeric data types in an attempt to retain a minimum level of precision, regardless of what compiler is used to build the program. For example:
integer, parameter :: rsp = selected_real_kind(4)
...
real(kind=rsp) :: real_var
The problem is that I have used MPI to parallelize the code and I need to make sure the MPI communications are specifying the same type with the same precision. I was using the following approach to stay consistent with the approach in my program:
call MPI_Type_create_f90_real(4,MPI_UNDEFINED,rsp_mpi,mpi_err)
...
call MPI_Send(real_var,1,rsp_mpi,dest,tag,MPI_COMM_WORLD,err)
However, I have found that this MPI routine is not particularly well-supported for different MPI implementations, so it's actually making my program non-portable. If I omit the MPI_Type_create routine, then I'm left to rely on the standard MPI_REAL and MPI_DOUBLE_PRECISION data types, but what if that type is not consistent with what selected_real_kind picks as the real type that will ultimately be passed around by MPI? Am I stuck just using the standard real declaration for a datatype, with no kind attribute and, if I do that, am I guaranteed that MPI_REAL and real are always going to have the same precision, regardless of compiler and machine?
UPDATE:
I created a simple program that demonstrates the issue I see when my internal reals have higher precision than what is afforded by the MPI_DOUBLE_PRECISION type:
program main
use mpi
implicit none
integer, parameter :: rsp = selected_real_kind(16)
integer :: err
integer :: rank
real(rsp) :: real_var
call MPI_Init(err)
call MPI_Comm_rank(MPI_COMM_WORLD,rank,err)
if (rank.eq.0) then
real_var = 1.123456789012345
call MPI_Send(real_var,1,MPI_DOUBLE_PRECISION,1,5,MPI_COMM_WORLD,err)
else
call MPI_Recv(real_var,1,MPI_DOUBLE_PRECISION,0,5,MPI_COMM_WORLD,&
MPI_STATUS_IGNORE,err)
end if
print *, rank, real_var
call MPI_Finalize(err)
end program main
If I build and run with 2 cores, I get:
0 1.12345683574676513672
1 4.71241976735884452383E-3998
Now change the 16 to a 15 in selected_real_kind and I get:
0 1.1234568357467651
1 1.1234568357467651
Is it always going to be safe to use selected_real_kind(15) with MPI_DOUBLE_PRECISION no matter what machine/compiler is used to do the build?
Use the Fortran 2008 intrinsic STORAGE_SIZE to determine the number bytes that each number requires and send as bytes. Note that STORAGE_SIZE returns the size in bits, so you will need to divide by 8 to get the size in bytes.
This solution works for moving data but does not help you use reductions. For that you will have to implement a user-defined reduction operation. If that's important to you, I will update my answer with the details.
For example:
program main
use mpi
implicit none
integer, parameter :: rsp = selected_real_kind(16)
integer :: err
integer :: rank
real(rsp) :: real_var
call MPI_Init(err)
call MPI_Comm_rank(MPI_COMM_WORLD,rank,err)
if (rank.eq.0) then
real_var = 1.123456789012345
call MPI_Send(real_var,storage_size(real_var)/8,MPI_BYTE,1,5,MPI_COMM_WORLD,err)
else
call MPI_Recv(real_var,storage_size(real_var)/8,MPI_BYTE,0,5,MPI_COMM_WORLD,&
MPI_STATUS_IGNORE,err)
end if
print *, rank, real_var
call MPI_Finalize(err)
end program main
I confirmed that this change corrects the problem and the output I see is:
0 1.12345683574676513672
1 1.12345683574676513672
Not really an answer, but we have the same problem and use something like this:
!> Number of digits for single precision numbers
integer, parameter, public :: single_prec = 6
!> Number of digits for double precision numbers
integer, parameter, public :: double_prec = 15
!> Number of digits for extended double precision numbers
integer, parameter, public :: xdble_prec = 18
!> Number of digits for quadruple precision numbers
integer, parameter, public :: quad_prec = 33
integer, parameter, public :: rk_prec = double_prec
!> The kind to select for default reals
integer, parameter, public :: rk = selected_real_kind(rk_prec)
And then have an initialization routine where we do:
!call mpi_type_create_f90_real(rk_prec, MPI_UNDEFINED, rk_mpi, iError)
!call mpi_type_create_f90_integer(long_prec, long_k_mpi, iError)
! Workaround shitty MPI-Implementations.
select case(rk_prec)
case(single_prec)
rk_mpi = MPI_REAL
case(double_prec)
rk_mpi = MPI_DOUBLE_PRECISION
case(quad_prec)
rk_mpi = MPI_REAL16
case default
write(*,*) 'unknown real type specified for mpi_type creation'
end select
long_k_mpi = MPI_INTEGER8
While this is not nice, it works reasonably well, and seems to be usable on Cray, IBM BlueGene and conventional Linux Clusters.
Best thing to do is push sites and vendors to properly support this in MPI. As far as I know it has been fixed in OpenMPI and planned to be fixed in MPICH by 3.1.1. See OpenMPI Tickets 3432 and 3435 as well as MPICH Tickets 1769 and 1770.
How about:
integer, parameter :: DOUBLE_PREC = kind(0.0d0)
integer, parameter :: SINGLE_PREC = kind(0.0e0)
integer, parameter :: MYREAL = DOUBLE_PREC
if (MYREAL .eq. DOUBLE_PREC) then
MPIREAL = MPI_DOUBLE_PRECISION
else if (MYREAL .eq. SINGLE_PREC) then
MPIREAL = MPI_REAL
else
print *, "Erorr: Can't figure out MPI precision."
STOP
end if
and use MPIREAL instead of MPI_DOUBLE_PRECISION from then on.

Poor scaling and a segmentation fault in a Fortran OpenMP code

I'm having some trouble when executing a program with a parallel do. Here is a test code.
module test
use, intrinsic :: iso_fortran_env, only: dp => real64
implicit none
contains
subroutine Addition(x,y,s)
real(dp),intent(in) :: x,y
real(dp), intent(out) :: s
s = x+y
end subroutine Addition
function linspace(length,xi,xf) result (vec)
! function to create an equally spaced vector given a begin and end point
real(dp),intent(in) :: xi,xf
integer, intent(in) :: length
real(dp),dimension(1:length) :: vec
integer ::i
real(dp) :: increment
increment = (xf-xi)/(real(length)-1)
vec(1) = xi
do i = 2,length
vec(i) = vec(i-1) + increment
end do
end function linspace
end module test
program paralleltest
use, intrinsic :: iso_fortran_env, only: dp => real64
use test
use :: omp_lib
implicit none
integer, parameter :: length = 1000
real(dp),dimension(length) :: x,y
real(dp) :: s
integer:: i,j
integer :: num_threads = 8
real(dp),dimension(length,length) :: SMatrix
x = linspace(length,.0d0,1.0d0)
y = linspace(length,2.0d0,3.0d0)
!$ call omp_set_num_threads(num_threads)
!$OMP PARALLEL DO
do i=1,size(x)
do j = 1,size(y)
call Addition(x(i),y(j),s)
SMatrix(i,j) = s
end do
end do
!$OMP END PARALLEL DO
open(unit=1,file ='Add6.dat')
do i= 1,size(x)
do j= 1,size(y)
write(1,*) x(i),";",y(j),";",SMatrix(i,j)
end do
end do
close(unit=1)
end program paralleltest
I'm running the program in the following waygfortran-8 -fopenmp paralleltest.f03 -o pt.out -mcmodel=medium and then export OMP_NUM_THREADS=8
This simple code brings me at least two big questions on parallel do. The first is that if I run with length = 1100 or greater, I have Segmentation fault (core dump) error message but with smaller values it runs with no problem. The second is about the time it takes. When I run it with length = 1000 (run with time ./pt.out) the time it takes is 1,732s but if I run it in a sequential way (without calling the -fopenmplibrary and with taskset -c 4 time./pt.out ) it takes 1,714s. I guess the difference between both ways arise in a longer and more complex code where parallel is more usefull. In fact when I tried it with more complex calculations running in parallel with eight threads, time was reduced at half that it took in sequential but not an eighth as I expected. In view of this my questions are, is any optimization available always or is it code dependent? and second, is there a friendly way to control which thread runs which iteration? That is the first running the first length/8 iteration, and so on, like performing several taskset 's with different code where in each is the iteration that I want.
As I commented, the Segmentation fault has been treated elsewhere Why Segmentation fault is happening in this openmp code?, I would use an allocatable array, but you can also set the stacksize using ulimit -s.
Regarding the time, almost all of the runtime is spent in writing the array to the external file.
But even if you remove that and you measure the time only spent in the parallel section using omp_get_wtime() and increase the problem size, it still does not scale too well. This because there is very little computation for the CPU to do and a lot of array writing to memory (accessing main memory is slow - cache misses).
As Jean-Claude Arbaut pointed out, your loop order is wrong and makes accessing the memory even slower. Some compilers can change that for you with higher optimization levels (-O2 or -O3), but only some of them.
And even worse, as Jim Cownie pointed out, you have a race condition. Multiple threads try to use the same s for both reading and writing and the program is invalid. You need to make s private using private(s).
With the above fixes I get a roughly two times faster parallel section with four cores and four threads. Don't try to use hyper-threading, it slows the program down.
If you give the CPU more computational work to do, like s = Bessel_J0(x)/Bessel_J1(y) it scales pretty well for me, almost four times faster with four threads, and hyper threading does speed it up a little bit.
Finally, I suggest just removing the manual setting of the number of threads, it is a pain for testing. If you remove that, you can use OMP_NUM_THREADS=4 ./a.out easily.

Writing assumed-size array causes "upper bound shall not be omitted..."

I am writing code to add on a closed-source Finite-Element Framework that forces me (due to relying on some old F77 style approaches) in one place to rely on assumed-size arrays.
Is it possible to write an assumed-size array to the standard output, whatever its size may be?
This is not working:
module fun
implicit none
contains
subroutine writer(a)
integer, dimension(*), intent(in) :: a
write(*,*) a
end subroutine writer
end module fun
program test
use fun
implicit none
integer, dimension(2) :: a
a(1) = 1
a(2) = 2
call writer(a)
end program test
With the Intel Fortran compiler throwing
error #6364: The upper bound shall not be omitted in the last dimension of a reference to an assumed size array.
The compiler does not know how large an assumed-size array is. It has only the address of the first element. You are responsible to tell how large it is.
write(*,*) a(1:n)
Equivalently you can use an explicit-size array
integer, intent(in) :: a(n)
and then you can do
write(*,*) a
An assumed-size array may not occur as a whole array reference when that reference requires the shape of the array. As an output item in a write statement that is one such disallowed case.
So, in that sense the answer is: no, it is not possible to have the write statement as you have it.
From an assumed-size array, array sections and array elements may appear:
write (*,*) a(1:2)
write (*,*) a(1), a(2)
write (*,*) (a(i), i=1,2)
leading simply to how to get the value 2 into the subroutine; at other times it may be 7 required. Let's call it n.
Naturally, changing the subroutine is tempting:
subroutine writer (a,n)
integer n
integer a(n) ! or still a(*)
end subroutine
or even
subroutine writer (a)
integer a(:)
end subroutine
One often hasn't a choice, alas, in particular when associating a procedure with a dummy procedure with a specific interface . However, n can get into the subroutine through any of several other ways: as a module or host entity, or through a common block (avoid this one if possible). These methods do not require modifying the interface of the procedure. For example:
subroutine writer(a)
use aux_params, only : n
integer, dimension(*), intent(in) :: a
write(*,*) a(1:n)
end subroutine writer
or we could have n as an entity in the module fun and have it accesible in writer through host association. In either case, setting this n's value in the main program before writer is executed will be necessary.

How to avoid declaring and setting the value of a variable in each subroutine?

How to avoid repeated declaration of a variable that has a constant value in subroutines?
For example:
program test
implicit none
integer :: n
integer :: time
print*, "enter n" ! this will be a constant value for the whole program
call Calcul(time)
print*, time
end program
subroutine Calcul(Time)
implicit none
integer :: time
! i don't want to declare the constant n again and again because some times the subroutines have a lot of variables.
time = n*2
end subroutine
Sometimes there are a lot of constants that are defined by the user and I will make a lot of subroutines that use those constants, so I want to stock them and and use them without redefining them again and again.
For global variables use modules (old FORTRAN used common blocks, but they are obsolete):
module globals
implicit none
integer :: n
contains
subroutine read_globals() !you must call this subroutine at program start
print*, "enter n" ! this will be a constant value for the whole program
read *, n
end subroutine
end module
!this subroutine should be better in a module too !!!
subroutine Calcul(Time)
use globals !n comes from here
implicit none
integer :: time
time = n*2
end subroutine
program test
use globals ! n comes from here if needed
implicit none
integer :: time
call read_globals()
call Calcul(time)
print*, time
end program
There are many questions and answers explaining how to use Fortran modules properly on Stack Overflow.