Random permutation and random seed - fortran

I am employing the Knuth algorithm to generate a random permutation
of an n-tuple. This is the code. Fixed n, it generates random permutations and collect all the different ones until it finds all the n! permutations. At the end it prints also the number of trials needed to find all the permutations. I have also inserted the initialization of the seed from the time (in a very simple and naive way, though). There are two options (A and B). A: The seed is fixed once for all in the main program. B: The seed is fixed every time a random permutation is computed (below the second option is commented).
implicit none
integer :: n,ncomb
integer :: i,h,k,x
integer, allocatable :: list(:),collect(:,:)
logical :: found
integer :: trials
!
! A
!
integer :: z,values(1:8)
integer, dimension(:), allocatable :: seed
call date_and_time(values=values)
call random_seed(size=z)
allocate(seed(1:z))
seed(:) = values(8)
call random_seed(put=seed)
n=4
ncomb=product((/(i,i=1,n)/))
allocate(list(n))
allocate(collect(n,ncomb))
trials=0
h=0
do
trials=trials+1
list=Shuffle(n)
found=.false.
do k=1,h
x=sum(abs(list-collect(:,k)))
if ( x == 0 ) then
found=.true.
exit
end if
end do
if ( .not. found ) then
h=h+1
collect(:,h)=list
print*,h,')',collect(:,h)
end if
if ( h == ncomb ) exit
end do
write(*,*) "Trials= ",trials
contains
function Shuffle(n) result(list)
integer, allocatable :: list(:)
integer, intent(in) :: n
integer :: i, randpos, temp,h
real :: r
!
! B
!
! integer :: z,values(1:8)
! integer, dimension(:), allocatable :: seed
! call date_and_time(values=values)
! call random_seed(size=z)
! allocate(seed(1:z))
! seed(:) = values(8)
! call random_seed(put=seed)
allocate(list(n))
list = (/ (h, h=1,n) /)
do i = n, 2, -1
call random_number(r)
randpos = int(r * i) + 1
temp = list(randpos)
list(randpos) = list(i)
list(i) = temp
end do
end function Shuffle
end
You can check that the second option is not good at all. For n=4 it takes around 100 times more trials to obtain the total number of permutations and for n=5 it gets stuck.
My questions are:
Why does calling random_seed multiple times give wrong results? What kind of systematic error I am introducing? Isn't it equivalent to calling random seed only once but launching the code several times (each time generating only one random permutation)?
If I want to launch several times the code, computing a single permutation, I guess that if I initialize the random seed I have the same problem (regardless the position of the initialization, since now I am computing only one permutation). Correct? In this case, what I have to do in order to initialize the seed wihouth spoiling the uniform sampling? Because if I do not initialize the seed in a random way I obtain the same permutation. I guess I could print and read the seed everytime I launch the code, in order to not to start from the same pseudo-random numbers. However, this is complicated to do if I launch several instances of the code in parallel.
UPDATE
I have understood the reply. In conclusion, if I want to generate pseudorandom numbers at each call by initializing the seed, what I can do is:
A) Old gfortran
Use the subroutine init_random_seed() here
https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gfortran/RANDOM_005fSEED.html
B) Most recent gfortran versions
call random_seed()
C) Fortran2018
call random_init(repeatable, image_distinct)
Questions
In the C) case, should I set repeatable=.false., image_distinct=.true.
to have a different random number each time?
What could be an efficient way to write the code in a portable way, so that
it works whatever the compiler is? (I mean, the code recognizes what is available and works accordingly)

You certainly should not ever call random_seed() repeatedly. It is supposed to be called just once. You make the problem worse by setting it in such a crude way.
Yes, one does often use the data and time to initialize it, but one must pre-process the time data through something that adds some entropy, like through some very simple random-generator. A good example can be found in the documentation of RANDOM_SEED for older versions of gfortran: https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gfortran/RANDOM_005fSEED.html See how lcg() is used there to transform the data_and_time() data.
Note that more recent versions of gfortran will generate a random seed that is different every time just by calling random_seed() without any arguments. Older versions returned the same seed every time.
Also note that Fortran 2018 has random_init() where you can specify repeatable= to be true or false. With false you get a different sequence every time.
The portable thing is to use standard Fortran, that's all. But you cannot use new features and old compiler versions at the same time. This kind of portability does not exist. With old compilers you can only use old standard features. I won't even start writing about autoconf and stuff, it is not worth it.
So,
you can set your random number seed to be the same every time or distinct every time (see above),
and
you should always call random_seed or random_init only once.
Why does calling random_seed multiple times give wrong results?
You are re-starting the pseudorandom sequence to some unspecified state, probably with insufficient entropy. Quite easily quite close to the last starting state.
What kind of systematic error I am introducing? Isn't it equivalent to calling random seed only once but launching the code several times (each time generating only one random permutation)?
It might be similar. But your seeding using the time is way too naïve and when running in the loop the date and time is way too similar if not completely equal in most bits. Some transform as linked above might mask that problem anyway but putting the date and time itself as your seed is just not going to work.

Related

gfortran how do I increment random_seed by 2^128

The gfortran page on random_seed says that when using OMP threads, each thread increments its seed by 2^128. I am wondering how I increment the seed by 2^128 manually. I wrote a little test program to set the master seed at all 0, and then see what the seeds were, but I don't understand what I'm seeing. What I'd like to know is for example what I put in the subroutine increment_by_2_tothe_128
program main
implicit none
character(len=32) :: arg
integer :: n
integer :: i
integer :: nthreads
integer, allocatable :: seed(:, :)
integer, allocatable :: master_seed(:)
real, allocatable :: rn(:)
call get_command_argument(1, arg)
read(arg, *) nthreads
call random_seed(size=n)
allocate(seed(n, nthreads))
allocate(master_seed(n))
allocate(rn(nthreads))
master_seed = 0
seed = 0
call random_seed(put=master_seed)
! call increment_by_2_tothe_128(n)
call omp_set_num_threads(nthreads)
!$OMP PARALLEL DO
do i=1,nthreads
call random_number(rn(i))
call random_seed(get=seed(:,i))
end do
do i=1,nthreads
print *, i
print *, rn(i)
print *, seed(:,i)
end do
end program main
subroutine increment_by_2_tothe_128(n)
implicit none
integer, intent(in) :: n
integer :: current_seed(n)
integer :: increment_seed(n)
call random_seed(get=current_seed)
! what goes here:
! incrememt_seed = current_seed + 2**128
call random_seed(put=increment_seed)
end subroutine increment_by_2_tothe_128
You cannot do that manually. You need the access to the random number generator to be able to do that, but the internals are not exposed to Fortran programmers. And you obviously cannot call the generator 2^128 times.
If you need to do the shift, you need to use some pseud-random number generator that does expose the internals and at the same time allows this kind of shift. That can be, for example, the xoroshiro PRNG family that is used internally by gfortran. These generators have a specialized function for this shift:
All generators, being based on linear recurrences, provide jump
functions that make it possible to simulate any number of calls to the
next-state function in constant time, once a suitable jump polynomial
has been computed. We provide ready-made jump functions for a number
of calls equal to the square root of the period, to make it easy
generating non-overlapping sequences for parallel computations, and
equal to the cube of the fourth root of the period, to make it
possible to generate independent sequences on different parallel
processors.
These generators are most often implemented in C, but Fortran implementations also exist (subroutine rng_jump is the jump function, disclaimer: the link goes to my repository, no guarantees for the quality).

Poor scaling and a segmentation fault in a Fortran OpenMP code

I'm having some trouble when executing a program with a parallel do. Here is a test code.
module test
use, intrinsic :: iso_fortran_env, only: dp => real64
implicit none
contains
subroutine Addition(x,y,s)
real(dp),intent(in) :: x,y
real(dp), intent(out) :: s
s = x+y
end subroutine Addition
function linspace(length,xi,xf) result (vec)
! function to create an equally spaced vector given a begin and end point
real(dp),intent(in) :: xi,xf
integer, intent(in) :: length
real(dp),dimension(1:length) :: vec
integer ::i
real(dp) :: increment
increment = (xf-xi)/(real(length)-1)
vec(1) = xi
do i = 2,length
vec(i) = vec(i-1) + increment
end do
end function linspace
end module test
program paralleltest
use, intrinsic :: iso_fortran_env, only: dp => real64
use test
use :: omp_lib
implicit none
integer, parameter :: length = 1000
real(dp),dimension(length) :: x,y
real(dp) :: s
integer:: i,j
integer :: num_threads = 8
real(dp),dimension(length,length) :: SMatrix
x = linspace(length,.0d0,1.0d0)
y = linspace(length,2.0d0,3.0d0)
!$ call omp_set_num_threads(num_threads)
!$OMP PARALLEL DO
do i=1,size(x)
do j = 1,size(y)
call Addition(x(i),y(j),s)
SMatrix(i,j) = s
end do
end do
!$OMP END PARALLEL DO
open(unit=1,file ='Add6.dat')
do i= 1,size(x)
do j= 1,size(y)
write(1,*) x(i),";",y(j),";",SMatrix(i,j)
end do
end do
close(unit=1)
end program paralleltest
I'm running the program in the following waygfortran-8 -fopenmp paralleltest.f03 -o pt.out -mcmodel=medium and then export OMP_NUM_THREADS=8
This simple code brings me at least two big questions on parallel do. The first is that if I run with length = 1100 or greater, I have Segmentation fault (core dump) error message but with smaller values it runs with no problem. The second is about the time it takes. When I run it with length = 1000 (run with time ./pt.out) the time it takes is 1,732s but if I run it in a sequential way (without calling the -fopenmplibrary and with taskset -c 4 time./pt.out ) it takes 1,714s. I guess the difference between both ways arise in a longer and more complex code where parallel is more usefull. In fact when I tried it with more complex calculations running in parallel with eight threads, time was reduced at half that it took in sequential but not an eighth as I expected. In view of this my questions are, is any optimization available always or is it code dependent? and second, is there a friendly way to control which thread runs which iteration? That is the first running the first length/8 iteration, and so on, like performing several taskset 's with different code where in each is the iteration that I want.
As I commented, the Segmentation fault has been treated elsewhere Why Segmentation fault is happening in this openmp code?, I would use an allocatable array, but you can also set the stacksize using ulimit -s.
Regarding the time, almost all of the runtime is spent in writing the array to the external file.
But even if you remove that and you measure the time only spent in the parallel section using omp_get_wtime() and increase the problem size, it still does not scale too well. This because there is very little computation for the CPU to do and a lot of array writing to memory (accessing main memory is slow - cache misses).
As Jean-Claude Arbaut pointed out, your loop order is wrong and makes accessing the memory even slower. Some compilers can change that for you with higher optimization levels (-O2 or -O3), but only some of them.
And even worse, as Jim Cownie pointed out, you have a race condition. Multiple threads try to use the same s for both reading and writing and the program is invalid. You need to make s private using private(s).
With the above fixes I get a roughly two times faster parallel section with four cores and four threads. Don't try to use hyper-threading, it slows the program down.
If you give the CPU more computational work to do, like s = Bessel_J0(x)/Bessel_J1(y) it scales pretty well for me, almost four times faster with four threads, and hyper threading does speed it up a little bit.
Finally, I suggest just removing the manual setting of the number of threads, it is a pain for testing. If you remove that, you can use OMP_NUM_THREADS=4 ./a.out easily.

Benchmarking fortran loop

I need to benchmark a part of a fortran program to understand and quantify the impact of specific changes (in order to make the code more maintainable we'd like to make it more OO, taking advantage of function pointers for example).
I have a loop calling several times the same subroutines to perform computations on finite elements. I want to see the impact of using function pointers instead of just hard-coded functions.
do i=1,n_of_finite_elements
! Need to benchmark execution time of this code
end do
What would be a simple way to get the execution time of such a loop, and format it in a nic way ?
I have a github project that measures the performance of passing various arrays at https://github.com/jlokimlin/array_passing_performance.git
It uses the CpuTimer derived data type from https://github.com/jlokimlin/cpu_timer.git.
Usage:
use, intrinsic :: iso_fortran_env, only: &
wp => REAL64, &
ip => INT32
use type_CpuTimer, only: &
CpuTimer
type (CpuTimer) :: timer
real (wp) :: wall_clock_time
real (wp) :: total_processor_time
integer (ip) :: units = 0 ! (optional argument) = 0 for seconds, or 1 for minutes, or 2 for hours
! Starting the timer
call timer%start()
do i = 1, n
!...some big calculation...
end do
! Stopping the timer
call timer%stop()
! Reading the time
wall_clock_time = timer%get_elapsed_time(units)
total_processor_time = timer%get_total_cpu_time(units)
! Write time stamp to standard output
call timer%print_time_stamp()
! Write compiler info to standard output
call timer%print_compiler_info()

Precision problems with very large reals - Fortran

The problem I'm attempting to tackle at the moment involves computing the order of 10 modulo(n), where n could be any number less than 1000. I have a function to do exactly that, however, I am unable to obtain accurate results as the value of the order increases.
The function works correctly as long as the order is sufficiently small, but returns incorrect values for large orders. So I stuck in some output to the terminal to locate the problem, and discovered that when I use exponentiation, the accuracy of my reals is being compromised.
I declared ALL variables in the function and in the program I tested it from as real(kind=nkind) where nkind = selected_real_kind(p=18, r=308). Any numbers explicitly referenced are also declared as, for example, 1.0_nkind. However, when I print out 10**n for n counting up from 1, I find that at 10**27, the value is correct. However, 10**28 gives 9999999999999999999731564544. All higher powers are similarly distorted, and this inaccuracy is the source of my problem.
So, my question is, is there a way to work around the error? I don't know of any way to use a more extended precision than I'm already using in the calculations.
Thanks,
Sean
*EDIT: There's not much to see in the code, but here you go:
integer, parameter :: nkind = selected_real_kind(p=18, r = 308)
real(kind=nkind) function order_ten_modulo(n)
real(kind=nkind) :: n, power
power = 1.0_nkind
if (mod(n, 5.0_nkind) == 0 .or. mod(n, 2.0_nkind) == 0) then
order_ten_modulo = 0
return
end if
do
if (power>300.0) then ! Just picked this number as a safeguard against endless looping -
exit
end if
if (mod(10.0_nkind**power, n) == 1.0_nkind) then
order_ten_modulo = power
exit
end if
power = power + 1.0_nkind
end do
return
end function order_ten_modulo

FFTW: Trouble with real to complex and complex to real 2D tranfsorms

As the title states I'm using FFTW (version 3.2.2) with Fortran 90/95 to perform a 2D FFT of real data (actually a field of random numbers). I think the forward step is working (at least I am getting some ouput). However I wanted to check everything by doing the IFFT to see if I can re-construct the original input. Unfortunately when I call the complex to real routine, nothing happens and I obtain no error output, so I'm a bit confused. Here are some code snippets:
implicit none
include "fftw3.f"
! - im=501, jm=401, and lm=60
real*8 :: u(im,jm,lm),recov(im,jm,lm)
complex*8 :: cu(1+im/2,jm)
integer*8 :: planf,planb
real*8 :: dv
! - Generate array of random numbers
dv=4.0
call random_number(u)
u=u*dv
recov=0.0
k=30
! - Forward step (FFT)
call dfftw_plan_dft_r2c_2d(planf,im,jm,u(:,:,k),cu,FFTW_ESTIMATE)
call dfftw_execute_dft_r2c(planf,u(:,:,k),cu)
call dfftw_destroy_plan(planf)
! - Backward step (IFFT)
call dfftw_plan_dft_c2r_2d(planb,im,jm,cu,recov(:,:,k),FFTW_ESTIMATE)
call dfftw_execute_dft_c2r(planb,cu,recov(:,:,k))
call dfftw_destroy_plan(planb)
The above forward step seems to work (r2c) but the backward step does not seem to work. I checked this by differencing the u and recov arrays - which ended up not being zero. Additionally the max and min values of the recov array were both zero, which seems to indicate that nothing was changed.
I've looked around the FFTW documentation and based my implementation on the following page http://www.fftw.org/fftw3_doc/Fortran-Examples.html#Fortran-Examples . I am wondering if the problem is related to indexing, at least that's the direction I am leaning. Anyway, if any one could offer some help, that would be wonderful!
Thanks!
Not sure if this is the root of all troubles here, but the way you declare variables may be the culprit.
For most compilers (this is apparently not even a standard), Complex*8 is an old syntax for single precision: the complex variable occupies a total of 8 bytes, shared between the real and the imaginary part (4+4 bytes).
[Edit 1 following Vladimir F comment to my answer, see his link for details:] In my experience (i.e. the systems/compiler I ever used), Complex(Kind=8) corresponds to the declaration of a double precision complex number (a real and an imaginary part, both of which occupy 8 bytes).
On any system/compiler, Complex(Kind=Kind(0.d0)) should declare a double precision complex.
In short, your complex array does not have the right size. Replace occurences of Real*8 and Complex*8 by Real(kind=8) and Complex(Kind=8) (or Complex(Kind=kind(0.d0)) for a better portability), respectively.