I have some Fortran code that performs a simulation. The elapsed time is stored in et, and the timestep is stored in dt. Both are defined as type real. There is another real variable tot which holds the maximum time the simulation should run. i is a counting variable of type integer. My first attempt was like this:
real, intent(in) :: dt
real, intent(in) :: tot
real :: et
integer :: i
et = 0.0
i = 0
do
i = i+1
et = real(i)*dt
if (et > tot) exit
! main code here
end do
I wanted to get rid of i since it was only used in the one place, however, when I tried this, the program hangs when the total time is large:
real, intent(in) :: dt
real, intent(in) :: tot
real :: et
et = 0.0
do
et = et + dt
if (et > tot) exit
! main code here
end do
What is the difference between the two code samples that causes the program to respond so differently? My compiler is g77.
EDIT: I have added the declarations and initializations to the code samples above.
EDIT 2: The initial values passed to the subroutine are dt = 1e-6 and tot = 100.
I don't know if this is your error since you don't give the whole program, but in the first code, the first thing you do is set et equal to dt, since at that point i=1. In the second code however, you are using et without having set it (as far as we can guess). Also, dt seems to be used uninitialized. If the bytes at the memory address of et give rise to a large negative float, it may take much longer to reach tot. That's as far as I can think of anything without have more code.
EDIT thanks for the update.
Well in that case I think just read the answer of haraldkl, I think that's your solution. If you need to reach 100 by adding up 1.0e-6, this isn't going to work for a 4-bytes real, as that only has about 6-7 meaningful digits in base 10. Your first solution is slighly better, since you can reach about 2e9 with a 4-byte int. One solution is to use 8-byte variables. However, you should always build in an extra check (e.g. if (et > tot .OR. i > max_iter)) to allow for a maximum of iterations, so you can safe-guard against this, because even if you use the integer solution, if you would make tot larger, your integer might overflow and you will be stuck in an infinite loop too.
If dt is very small in relation to tot, it might also be that at one point dt is so small, that adding it to the, by then large, et has no effect (lost in numerical precision), and thus et does not grow beyond that point...
It is hard to conclude anything when you give partial code, skip the declarations and instead of showing the error messages you just give your interpretation of them, while it is clear that if you knew how to interpret them correctly you would not be getting them in the first place.
Your second loop differs from the first in several things worth noting: a) what are the values of the variables at the start of the loop, b) what is the loop counter, c) is et real or an integer? ... et cetera
Here are two ways in which these loops can be written
program various_do_loops
integer :: i
real :: et, tot, dt
! DO WHILE LOOP (whoops, I just now see you're using g77
! so this may or may not work)
i = 0
et = 0.
tot = 10.
dt = 1.
do while (et<tot)
i = i+1
et = real(i)*dt
! main code goes here
! ....
! ....
write(*,'("et is currently ", f5.2)')et
end do
! Old kind of WHILE LOOP
i = 0
et = 0.
tot = 10.
dt = 1.
10 if(et<tot) then
i = i + 1
et = real(i)*dt
! main code goes here
! ....
! ....
write(*,'("et is currently ", f5.2)')et
goto 10
end if
end program various_do_loops
Related
How to verify in Fortran whether an iterative formula of a non-linear system will converge to the root near (x,y)?
It was easy for a programming language which support symbolic computations. But how to do that in Fortran? Like getting partial derivative of the component functions and check whether they bounded near the root. But I couldn't do that in fortran or haven't the idea how to do that. It will be a great help for me if anyone give me some idea for the following non-linear system now or if possible for a general case.
I want to use Fixed point iteration method for this case
Main system:
x^2+y=7
x-y^2=4
Iterative form (given):
X(n+1)=\sqrt(7-Y(n)),
Y(n+1)=-\sqrt(X(n)-4),
(x0,y0)=(4.4,1.0)
Theorem (which I follow)
The issue is, I need to check the boundedness of the partial derivatives of \sqrt(7-Y) and -\sqrt(X-4) on some region around (x0,y0)=(4.4,1.0). I can write the partial derivative function in fortran but how to evaluate so many values and check it is bounded around the (4.4,1.0).
Update
One possibly right solution would be to get arrays of values around (4.4,1.0) like (4.4-h,1.0-h)*(4.4+h,1.0+h) and evaluate the defined partial derivative function and approximate their boundedness. I haven't encounter such problem in Fortran, so any suggestion on that also can help me a lot.
If you just want to check the boundedness of a function on a grid, you can do something like
program verify_convergence
implicit none
integer, parameter :: dp = selected_real_kind(15, 307)
real(dp) :: centre_point(2)
real(dp) :: step_size(2)
integer :: no_steps(2)
real(dp) :: point(2)
real(dp) :: derivative(2)
real(dp) :: threshold
integer :: i,j
real(dp) :: x,y
! Set fixed parameters
centre_point = [4.4_dp, 1.0_dp]
step_size = [0.1_dp, 0.1_dp]
no_steps = [10, 10]
threshold = 0.1_dp
! Loop over a 2-D grid of points around the centre point
do i=-no_steps(1),no_steps(1)
do j=-no_steps(2),no_steps(2)
! Generate the point, calculate the derivative at that point,
! and stop with a message if the derivative is not bounded.
point = centre_point + [i*step_size(1), j*step_size(2)]
derivative = calculate_derivative(point)
if (any(abs(derivative)>threshold)) then
write(*,*) 'Derivative not bounded'
stop
endif
enddo
enddo
write(*,*) 'Derivative bounded'
contains
! Takes a co-ordinate, and returns the derivative.
! Replace this with whatever function you actually need.
function calculate_derivative(point) result(output)
real(dp), intent(in) :: point(2)
real(dp) :: output(2)
output = [sqrt(7-point(2)), -sqrt(point(1)-4)]
end function
end program
I know the function calculate_derivative doesn't do what you want it to, but I'm not sure what function you actually want from your question. Just replace this function as required.
The main question is different: How can you calculate the solution of the mathematical problem without the help of any software? If you know that, we can program it in fortran or any language.
In particular, and assuming that n=0,1,2,3... to solve your problem you need to know X(0) and Y(0). With this you calculate
X(1)=\sqrt(7-Y(0)),
Y(1)=-\sqrt(X(0)-4)
Now you know X(1) and Y(1), then you can calculate
X(2)=\sqrt(7-Y(1)),
Y(2)=-\sqrt(X(1)-4)
etc.
If your system of equations converge to something, until some iterations (for example, n=10, 20, 100) you going the check that. But because the nature of fortran, it will not give you the solution in a symbolic way, it is not its goal.
I am employing the Knuth algorithm to generate a random permutation
of an n-tuple. This is the code. Fixed n, it generates random permutations and collect all the different ones until it finds all the n! permutations. At the end it prints also the number of trials needed to find all the permutations. I have also inserted the initialization of the seed from the time (in a very simple and naive way, though). There are two options (A and B). A: The seed is fixed once for all in the main program. B: The seed is fixed every time a random permutation is computed (below the second option is commented).
implicit none
integer :: n,ncomb
integer :: i,h,k,x
integer, allocatable :: list(:),collect(:,:)
logical :: found
integer :: trials
!
! A
!
integer :: z,values(1:8)
integer, dimension(:), allocatable :: seed
call date_and_time(values=values)
call random_seed(size=z)
allocate(seed(1:z))
seed(:) = values(8)
call random_seed(put=seed)
n=4
ncomb=product((/(i,i=1,n)/))
allocate(list(n))
allocate(collect(n,ncomb))
trials=0
h=0
do
trials=trials+1
list=Shuffle(n)
found=.false.
do k=1,h
x=sum(abs(list-collect(:,k)))
if ( x == 0 ) then
found=.true.
exit
end if
end do
if ( .not. found ) then
h=h+1
collect(:,h)=list
print*,h,')',collect(:,h)
end if
if ( h == ncomb ) exit
end do
write(*,*) "Trials= ",trials
contains
function Shuffle(n) result(list)
integer, allocatable :: list(:)
integer, intent(in) :: n
integer :: i, randpos, temp,h
real :: r
!
! B
!
! integer :: z,values(1:8)
! integer, dimension(:), allocatable :: seed
! call date_and_time(values=values)
! call random_seed(size=z)
! allocate(seed(1:z))
! seed(:) = values(8)
! call random_seed(put=seed)
allocate(list(n))
list = (/ (h, h=1,n) /)
do i = n, 2, -1
call random_number(r)
randpos = int(r * i) + 1
temp = list(randpos)
list(randpos) = list(i)
list(i) = temp
end do
end function Shuffle
end
You can check that the second option is not good at all. For n=4 it takes around 100 times more trials to obtain the total number of permutations and for n=5 it gets stuck.
My questions are:
Why does calling random_seed multiple times give wrong results? What kind of systematic error I am introducing? Isn't it equivalent to calling random seed only once but launching the code several times (each time generating only one random permutation)?
If I want to launch several times the code, computing a single permutation, I guess that if I initialize the random seed I have the same problem (regardless the position of the initialization, since now I am computing only one permutation). Correct? In this case, what I have to do in order to initialize the seed wihouth spoiling the uniform sampling? Because if I do not initialize the seed in a random way I obtain the same permutation. I guess I could print and read the seed everytime I launch the code, in order to not to start from the same pseudo-random numbers. However, this is complicated to do if I launch several instances of the code in parallel.
UPDATE
I have understood the reply. In conclusion, if I want to generate pseudorandom numbers at each call by initializing the seed, what I can do is:
A) Old gfortran
Use the subroutine init_random_seed() here
https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gfortran/RANDOM_005fSEED.html
B) Most recent gfortran versions
call random_seed()
C) Fortran2018
call random_init(repeatable, image_distinct)
Questions
In the C) case, should I set repeatable=.false., image_distinct=.true.
to have a different random number each time?
What could be an efficient way to write the code in a portable way, so that
it works whatever the compiler is? (I mean, the code recognizes what is available and works accordingly)
You certainly should not ever call random_seed() repeatedly. It is supposed to be called just once. You make the problem worse by setting it in such a crude way.
Yes, one does often use the data and time to initialize it, but one must pre-process the time data through something that adds some entropy, like through some very simple random-generator. A good example can be found in the documentation of RANDOM_SEED for older versions of gfortran: https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gfortran/RANDOM_005fSEED.html See how lcg() is used there to transform the data_and_time() data.
Note that more recent versions of gfortran will generate a random seed that is different every time just by calling random_seed() without any arguments. Older versions returned the same seed every time.
Also note that Fortran 2018 has random_init() where you can specify repeatable= to be true or false. With false you get a different sequence every time.
The portable thing is to use standard Fortran, that's all. But you cannot use new features and old compiler versions at the same time. This kind of portability does not exist. With old compilers you can only use old standard features. I won't even start writing about autoconf and stuff, it is not worth it.
So,
you can set your random number seed to be the same every time or distinct every time (see above),
and
you should always call random_seed or random_init only once.
Why does calling random_seed multiple times give wrong results?
You are re-starting the pseudorandom sequence to some unspecified state, probably with insufficient entropy. Quite easily quite close to the last starting state.
What kind of systematic error I am introducing? Isn't it equivalent to calling random seed only once but launching the code several times (each time generating only one random permutation)?
It might be similar. But your seeding using the time is way too naïve and when running in the loop the date and time is way too similar if not completely equal in most bits. Some transform as linked above might mask that problem anyway but putting the date and time itself as your seed is just not going to work.
I'm having some trouble when executing a program with a parallel do. Here is a test code.
module test
use, intrinsic :: iso_fortran_env, only: dp => real64
implicit none
contains
subroutine Addition(x,y,s)
real(dp),intent(in) :: x,y
real(dp), intent(out) :: s
s = x+y
end subroutine Addition
function linspace(length,xi,xf) result (vec)
! function to create an equally spaced vector given a begin and end point
real(dp),intent(in) :: xi,xf
integer, intent(in) :: length
real(dp),dimension(1:length) :: vec
integer ::i
real(dp) :: increment
increment = (xf-xi)/(real(length)-1)
vec(1) = xi
do i = 2,length
vec(i) = vec(i-1) + increment
end do
end function linspace
end module test
program paralleltest
use, intrinsic :: iso_fortran_env, only: dp => real64
use test
use :: omp_lib
implicit none
integer, parameter :: length = 1000
real(dp),dimension(length) :: x,y
real(dp) :: s
integer:: i,j
integer :: num_threads = 8
real(dp),dimension(length,length) :: SMatrix
x = linspace(length,.0d0,1.0d0)
y = linspace(length,2.0d0,3.0d0)
!$ call omp_set_num_threads(num_threads)
!$OMP PARALLEL DO
do i=1,size(x)
do j = 1,size(y)
call Addition(x(i),y(j),s)
SMatrix(i,j) = s
end do
end do
!$OMP END PARALLEL DO
open(unit=1,file ='Add6.dat')
do i= 1,size(x)
do j= 1,size(y)
write(1,*) x(i),";",y(j),";",SMatrix(i,j)
end do
end do
close(unit=1)
end program paralleltest
I'm running the program in the following waygfortran-8 -fopenmp paralleltest.f03 -o pt.out -mcmodel=medium and then export OMP_NUM_THREADS=8
This simple code brings me at least two big questions on parallel do. The first is that if I run with length = 1100 or greater, I have Segmentation fault (core dump) error message but with smaller values it runs with no problem. The second is about the time it takes. When I run it with length = 1000 (run with time ./pt.out) the time it takes is 1,732s but if I run it in a sequential way (without calling the -fopenmplibrary and with taskset -c 4 time./pt.out ) it takes 1,714s. I guess the difference between both ways arise in a longer and more complex code where parallel is more usefull. In fact when I tried it with more complex calculations running in parallel with eight threads, time was reduced at half that it took in sequential but not an eighth as I expected. In view of this my questions are, is any optimization available always or is it code dependent? and second, is there a friendly way to control which thread runs which iteration? That is the first running the first length/8 iteration, and so on, like performing several taskset 's with different code where in each is the iteration that I want.
As I commented, the Segmentation fault has been treated elsewhere Why Segmentation fault is happening in this openmp code?, I would use an allocatable array, but you can also set the stacksize using ulimit -s.
Regarding the time, almost all of the runtime is spent in writing the array to the external file.
But even if you remove that and you measure the time only spent in the parallel section using omp_get_wtime() and increase the problem size, it still does not scale too well. This because there is very little computation for the CPU to do and a lot of array writing to memory (accessing main memory is slow - cache misses).
As Jean-Claude Arbaut pointed out, your loop order is wrong and makes accessing the memory even slower. Some compilers can change that for you with higher optimization levels (-O2 or -O3), but only some of them.
And even worse, as Jim Cownie pointed out, you have a race condition. Multiple threads try to use the same s for both reading and writing and the program is invalid. You need to make s private using private(s).
With the above fixes I get a roughly two times faster parallel section with four cores and four threads. Don't try to use hyper-threading, it slows the program down.
If you give the CPU more computational work to do, like s = Bessel_J0(x)/Bessel_J1(y) it scales pretty well for me, almost four times faster with four threads, and hyper threading does speed it up a little bit.
Finally, I suggest just removing the manual setting of the number of threads, it is a pain for testing. If you remove that, you can use OMP_NUM_THREADS=4 ./a.out easily.
I'm trying to use the intrinsic function ‘CEILING’, but the rounding error makes it difficult to get what I want sometimes. The sample code is just very simple:
PROGRAM MAIN
IMPLICIT NONE
INTEGER, PARAMETER :: ppm_kind_double = KIND(1.0D0)
REAL(ppm_kind_double) :: before,after,dx
before = -0.112
dx = 0.008
after = CEILING(before/dx)
WRITE(*,*) before, dx, before/dx, after
END
And I got results:
The value I give to 'before' and 'dx' in the code is just for demonstration. For those before/dx = -13.5 for example, I want to use CEILING to get -13. But for the picture I show, I actually want to get -14. I have considered using some arguments like
IF(ABS(NINT(before/dx) - before/dx) < 0.001)
But that's simply not beautiful. Is there any better way to do this?
Update:
I was surprised to find that the problem won't occur if I set the variables to constants in ppm_kind_double. So I guess this 'rounding error' will only happen when the number of digits for rounding accuracy of the machine I use is more than what's defined in ppm_kind_double. I actually run my program(not this demo code) on a cluster, which I don't know about the machine precision. So maybe it's quad precision on that machine that leads to the problem?
After I set constants to double precision:
before = -0.112_ppm_kind_double
dx = 0.008_ppm_kind_double
This is a bit tricky, because you never know where the rounding error comes from. If dx was just a tiny bit larger than 0.008 then the division before/dx might still be rounded to the same value, but now -13 would be the correct answer.
That said, the most common method around that that I have seen is to just nudge the previous value ever so little into the opposite direction. Something like this:
program sign_test
use iso_fortran_env
implicit none
real(kind=real64) :: a, b
integer(kind=int32) :: c
a = -0.112
b = 0.008
c = my_ceiling(a/b)
print*, a, b, c
contains
function my_ceiling(v)
implicit none
real(kind=real64), intent(in) :: v
integer(kind=int32) :: my_ceiling
my_ceiling = ceiling(v - 1d-6, kind=int32)
end function my_ceiling
end program sign_test
This won't have any impact on the vast majority of values, but there are now a few values that will get rounded up by more than intended.
note if your reals are notionally "exact" to a specified precision you might do something like this:
after=nint(1000*before)/nint(1000*dx)
this works for your example.. you haven't said what you'd expect for both values positive and so on so you might need to work it a bit.
I need to benchmark a part of a fortran program to understand and quantify the impact of specific changes (in order to make the code more maintainable we'd like to make it more OO, taking advantage of function pointers for example).
I have a loop calling several times the same subroutines to perform computations on finite elements. I want to see the impact of using function pointers instead of just hard-coded functions.
do i=1,n_of_finite_elements
! Need to benchmark execution time of this code
end do
What would be a simple way to get the execution time of such a loop, and format it in a nic way ?
I have a github project that measures the performance of passing various arrays at https://github.com/jlokimlin/array_passing_performance.git
It uses the CpuTimer derived data type from https://github.com/jlokimlin/cpu_timer.git.
Usage:
use, intrinsic :: iso_fortran_env, only: &
wp => REAL64, &
ip => INT32
use type_CpuTimer, only: &
CpuTimer
type (CpuTimer) :: timer
real (wp) :: wall_clock_time
real (wp) :: total_processor_time
integer (ip) :: units = 0 ! (optional argument) = 0 for seconds, or 1 for minutes, or 2 for hours
! Starting the timer
call timer%start()
do i = 1, n
!...some big calculation...
end do
! Stopping the timer
call timer%stop()
! Reading the time
wall_clock_time = timer%get_elapsed_time(units)
total_processor_time = timer%get_total_cpu_time(units)
! Write time stamp to standard output
call timer%print_time_stamp()
! Write compiler info to standard output
call timer%print_compiler_info()