Related
The following Fortran function takes forever to print the Hello World 2 after printing Hello World 1.
program Test_Long_Run
implicit none
! Variables
integer,allocatable,dimension(:) :: test
integer :: i, j, k, l, m, int
allocate(test(1000*100)); test = 0
! Body of Test_Long_Run
print *, 'Hello World 1'
do k = 1,100
do j = 1,100
do i = 1,100
do m = 1,100
do l = 1,1000
test(l*m) = 2
int = 2
enddo
enddo
enddo
enddo
enddo
print *, 'Hello World 2'
end program Test_Long_Run
It runs very fast if I comment out test(l*m) = 2
The instructions in the inner body perform 100 billion multiplication and assignments to a table, plus an assignment to a variable. The array you write to contains 100 thousand integers, so does not fit into any cache. The elements you write to are not contiguous (stride: 1000*4 bytes), so each write must bypass/invalidate the cache and involve RAM. This is slow. Even hundreds of processor cycles per each assignment. If your processor runs at 2GHz, or at 2 billion instructions per second, you will make at most 10 million assignments per second. Now divide 100 billion by 10 million. 10 thousand seconds?
What happens if you remove the assignment to the array? The program will be left with an assignment to a single variable. Assignment of a constant value. A decent compiler will notice that all these loops can be thrown away and only one instruction will be left: int = 2.
Remember that a compiler has a right to modify your source code if it decides that it will not change the program's behavior.
I am converting f77 code to f90 code, and part of the code needs to sum over elements of a 3d matrix. In f77 this was accomplished by using 3 loops (over outer,middle,inner indices). I decided to use the f90 intrinsic sum (3 times) to accomplish this, and much to my surprise the answers differ. I am using the ifort compiler, have debugging, check-bounds, no optimization all turned on
Here is the f77-style code
r1 = 0.0
do k=1,nz
do j=1,ny
do i=1,nx
r1 = r1 + foo(i,j,k)
end do
end do
end do
and here is the f90 code
r = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)
I have tried all sorts of variations, such as swapping the order of the loops for the f77 code, or creating temporary 2D matrices and 1D arrays to "reduce" the dimensions while using SUM, but the explicit f77 style loops always give different answers from the f90+ SUM function.
I'd appreciate any suggestions that help understand the discrepancy.
By the way this is using one serial processor.
Edited 12:13 pm to show complete example
! ifort -check bounds -extend-source 132 -g -traceback -debug inline-debug-info -mkl -o verify verify.f90
! ./verify
program verify
implicit none
integer :: nx,ny,nz
parameter(nx=131,ny=131,nz=131)
integer :: i,j,k
real :: foo(nx,ny,nz)
real :: r0,r1,r2
real :: s0,s1,s2
real :: r2Dfooxy(nx,ny),r1Dfoox(nx)
call random_seed
call random_number(foo)
r0 = 0.0
do k=1,nz
do j=1,ny
do i=1,nx
r0 = r0 + foo(i,j,k)
end do
end do
end do
r1 = 0.0
do i=1,nx
do j=1,ny
do k=1,nz
r1 = r1 + foo(i,j,k)
end do
end do
end do
r2 = 0.0
do j=1,ny
do i=1,nx
do k=1,nz
r2 = r2 + foo(i,j,k)
end do
end do
end do
!*************************
s0 = 0.0
s0 = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)
s1 = 0.0
r2Dfooxy = SUM(foo, DIM = 3)
r1Dfoox = SUM(r2Dfooxy, DIM = 2)
s1 = SUM(r1Dfoox)
s2 = SUM(foo)
!*************************
print *,'nx,ny,nz = ',nx,ny,nz
print *,'size(foo) = ',size(foo)
write(*,'(A,4(ES15.8))') 'r0,r1,r2 = ',r0,r1,r2
write(*,'(A,3(ES15.8))') 'r0-r1,r0-r2,r1-r2 = ',r0-r1,r0-r2,r1-r2
write(*,'(A,4(ES15.8))') 's0,s1,s2 = ',s0,s1,s2
write(*,'(A,3(ES15.8))') 's0-s1,s0-s2,s1-s2 = ',s0-s1,s0-s2,s1-s2
write(*,'(A,3(ES15.8))') 'r0-s1,r1-s1,r2-s1 = ',r0-s1,r1-s1,r2-s1
stop
end
!**********************************************
sample output
nx,ny,nz = 131 131 131
size(foo) = 2248091
r0,r1,r2 = 1.12398225E+06 1.12399525E+06 1.12397238E+06
r0-r1,r0-r2,r1-r2 = -1.30000000E+01 9.87500000E+00 2.28750000E+01
s0,s1,s2 = 1.12397975E+06 1.12397975E+06 1.12398225E+06
s0-s1,s0-s2,s1-s2 = 0.00000000E+00-2.50000000E+00-2.50000000E+00
r0-s1,r1-s1,r2-s1 = 2.50000000E+00 1.55000000E+01-7.37500000E+00
First, welcome to StackOverflow. Please take the tour! There is a reason we expect a Minimal, Complete, and Verifiable example because we look at your code and can only guess at what might be the case and that is not too helpful for the community.
I hope the following suggestions helps you figure out what is going on.
Use the size() function and print what Fortran thinks are the sizes of the dimensions as well as printing nx, ny, and nz. As far as we know, the array is declared bigger than nx, ny, and nz and these variables are set according to the data set. Fortran does not necessarily initialize arrays to zero depending on whether it is a static or allocatable array.
You can also try specifying array extents in the sum function:
r = Sum(foo(1:nx,1:ny,1:nz))
If done like this, at least we know that the sum function is working on the exact same slice of foo that the loops loop over.
If this is the case, you will get the wrong answer even though there is nothing 'wrong' with the code. This is why it is particularly important to give that Minimal, Complete, and Verifiable example.
I can see the differences now. These are typical rounding errors from adding small numbers to a large sum. The processor is allowed to use any order of the summation it wants. There is no "right" order. You cannot really say that the original loops make the "correct" answer and the others do not.
What you can do is to use double precision. In extreme circumstances there are tricks like the Kahan summation but one rarely needs that.
Addition of a small number to a large sum is imprecise and especially so in single precision. You still have four significant digits in your result.
One typically does not use the DIM= argument, that is used in certain special circumstances.
If you want to sum all elements of foo, use just
s0 = SUM(foo)
That is enough.
What
s0 = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)
does is that it will make a temporary 2D arrays with each element be the sum of the respective row in the z dimension, then a 1D array with each element the sum over the last dimension of the 2D array and then finally the sum of that 1D array. If it is done well, the final result will be the same, but it well eat a lot of CPU cycles.
The sum intrinsic function returns a processor-dependant approximation to the sum of the elements of the array argument. This is not the same thing as adding sequentially all elements.
It is simple to find an array x where
summation = x(1) + x(2) + x(3)
(performed strictly left to right) is not the best approximation for the sum treating the values as "mathematical reals" rather than floating point numbers.
As a concrete example to look at the nature of the approximation with ifort, we can look at the following program. We need to enable optimizations here to see effects; the importance of order of summation is apparent even with optimizations disabled (with -O0 or -debug).
implicit none
integer i
real x(50)
real total
x = [1.,(EPSILON(0.)/2, i=1, SIZE(x)-1)]
total = 0
do i=1, SIZE(x)
total = total+x(i)
print '(4F17.14)', total, SUM(x(:i)), SUM(DBLE(x(:i))), REAL(SUM(DBLE(x(:i))))
end do
end program
If adding up in strict order we get 1., seeing that anything smaller in magnitude than epsilon(0.) doesn't affect the sum.
You can experiment with the size of the array and order of its elements, the scaling of the small numbers and the ifort floating point compilation options (such as -fp-model strict, -mieee-fp, -pc32). You can also try to find an example like the above using double precision instead of default real.
I am working on a program to calculate minimum nozzle length in supersonic nozzles (Method of Characteristics). I can't seem to figure out why my code won't write to my output file (the lines "write (6,1000)....). My code is below:
program tester
c----------------------------------------------------------------------c
implicit real (a-h,o-z)
integer count
real kminus(0:100),kplus(0:100),theta(0:100),nu(0:100),mach(0:100)
+ ,mu(0:100)
open (5,file='tester.in')
open (6,file='tester.out')
read (5,*) me
read (5,*) maxturn
read (5,*) nchar
read (5,*) theta0
close(5)
c.....set count to 0 and calculate dtheta
count=0
dtheta=(maxturn-theta0)/nchar
c.....first characteristic
do 10 an=1,nchar+1
count=count+1
c........these are already known
theta(count)=theta0+dtheta*(an-1)
nu(count)=theta(count)
c........trivial, but we will "calculate" them anyways
kminus(count)=2*theta(count)
kplus(count)=0
c........we feed it nu(count) and get m out
call pm_hall_approx(nu(count),m)
mach(count)=m
c........does not work with sqrt(...) for some reason
mu(count)=atan((1/(mach(count)**2)-1)**0.5)
write (6,1000) count,kminus(count),kplus(count),theta(count)
+ ,nu(count),mach(count),mu(count)
10 continue
c.....the other characteristics
do 30 bn=1,nchar
do 20 cn=1,(nchar+1-bn)
count=count+1
c...........these are given
kminus(count)=kminus(cn+bn)
kplus(count)=-1*kplus(bn+1)
c...........if this is the last point, copy the previous values
if (cn.eq.(nchar+1-bn)) then
kminus(count)=kminus(count-1)
kplus(count)=kplus(count-1)
endif
c...........calculate theta and nu
theta(count)=0.5*(kminus(count)+kplus(count))
nu(count)=0.5*(kminus(count)-kplus(count))
c...........calculate m
call pm_hall_approx(nu(count),m)
mach(count)=m
mu(count)=atan((1/(mach(count)**2)-1)**0.5)
write (6,1000) count,kminus(count),kplus(count),theta(count)
+ ,nu(count),mach(count),mu(count)
20 continue
30 continue
close(6)
stop
1000 format (11(1pe12.4))
end
c======================================================================c
include 'pm_hall_approx.f'
And my subroutine is given here:
subroutine pm_hall_approx(nu,mach)
c----------------------------------------------------------------------c
c Given a Mach number, use the Hall Approximation to calculate the
c Prandtl-Meyer Function.
c----------------------------------------------------------------------c
implicit real (a-h,o-z)
c.....set constants
parameter (a=1.3604,b=0.0962,c=-0.5127,d=-0.6722,e=-0.3278)
parameter (numax=2.2769)
y=nu/numax
mach=(1+a*y+b*y*y+c*y*y*y)/(1+d*y+e*y*y)
return
end
And here are the contents of tester.in
2.4 = mache
5.0 = maxturn
7 = nchar
0.375 = theta0
In fortran, unit number 6 is reserved for screen. Never use that as the unit number. Try changing it to some other number like write(7,1000) and then it should work.
I was assigned the following problem:
Make a Fortran program which will be able to read a degree[0-360] checking validity range(not type) and it will be able to calculate and print the cos(x) from the following equation, where x is in radians:
cos(x)=1-x^2/2! + x^4/4!-x^6/6!+x^8/8!-...
As a convergence criteria assume 10^(-5) using the absolute error between two successive repeats (I suppose it means do's).
For the calculation of the ! the greatest possible kind of integer should be used. Finally the total number of repeats should be printed on screen.
So my code is this:
program ex6_pr2
implicit none
!Variables and Constants
integer::i
real*8::fact,fact2 !fact=factorial
real,parameter::pi=3.14159265
double precision::degree,radiants,cosradiants,s,oldcosradiants,difference !degree,radiants=angle
print*,'This program reads and calculates an angle`s co-sinus'
print*,'Please input the degrees of the angle'
read*,degree
do while(degree<0 .or. degree>360) !number range
read*,degree
print*,'Error input degree'
cycle
end do
radiants=(degree*pi/180)
fact=1
fact2=1
s=0
cosradiants=0
!repeat structure
do i=2,200,1
fact=fact*i
fact2=fact2*(i+2)
oldcosradiants=cosradiants
cosradiants=(-(radiants)**i/fact)+(((radiants)**(i+2))/fact2)
difference=cosradiants-oldcosradiants
s=s+cosradiants
if(abs(difference)<1e-5) exit
end do
!Printing results
print*,s+1.
end program
I get right results for angles such as 45 degrees (or pi/4) and wrong for other for example 90 degrees or 180.
I have checked my factorials where I believe the error is hidden (at least for me).
Well I created another code which seems unable to run due to the following error:FUNCTION name,(RESULT of PROJECT2_EX6~FACT),used where not expected,perhaps missing '()'
program project2_ex6
implicit none
integer(kind=3)::degrees,i,sign
integer::n
double precision::x,err_limit,s_old,s
real,parameter::pi=3.14159265359
print*,'This program calculates the cos(x)'
print*,"Enter the angle's degrees"
read*,degrees
do
if(degrees<0.or.degrees>360) then
print*,'Degrees must be between 0-360'
else
x=pi*degrees/180
exit
end if
end do
sign=1
sign=sign*(-1)
err_limit=1e-5
n=0
s=0
s_old=0
do
do i=1,n
end do
s=(((-1.)**n/(fact(2.*n)))*x**(2.*n))*sign
s=s+s_old
n=n+1
if(abs(s-s_old)<1e-5) then
exit
else
s_old=s
cycle
end if
end do
print*,s,i,n
contains
real function fact(i)
double precision::fact
integer::i
if(i>=1) then
fact=i*fact(i-1)
else
fact=1
end if
return
end function
end program
Although it is your homework, I will help you here. The first thing which is wrong is ýour factorial which you need to replace with
fact = 1
do j = 1,i
fact = fact*j
enddo
second it is easier if you let your do loop do the job so run it as
do i=4,200,2
and predefine cosradians outside the do loob with
cosradiants = 1-radiants**2/2
additionally you need to take into account the changing sign which you can do in the loop using
sign = sign*(-1)
and starting it off with sign = 1 before the loop
in the loop its then
cosradiants= cosradiants+sign*radiants**i/fact
If you have included these things it should work (at least with my code it does)
Fortran 2003 has square bracket syntax for array concatenation, Intel fortran compiler supports it too. I wrote a simple code here for matrix concatenation:
program matrix
implicit none
real,dimension (3,3) :: mat1,mat2
real,dimension(3,6):: mat3
integer i
mat1=reshape( (/1,2,3,4,5,6,7,8,9/),(/3,3/))
mat2=reshape( (/1,2,3,4,5,6,7,8,9/),(/3,3/))
mat3=[mat1,mat2]
!display
do i=1,3,1
write(*,10) mat3(i,:)
10 format(F10.4)
end do
end program
But I get error as
mat3=[mat1,mat2]
Error: Incompatible ranks 2 and 1 in assignment
I expect the output as
1 2 3 1 2 3
4 5 6 4 5 6
7 8 9 7 8 9
Can someone comment where am I going wrong? What is rank 2 and 1 here? I guess all arrays have rank 2.
The array concatenation in fortran 2003 doesn't work as you think. When you concatenate, it's not going to stack the two arrays side by side. It will pick elements from the first array one by one and put into a one-dimensional array. Then it will do the same thing with the second array but it will append this to the 1-D form of first array.
The following code works.
program matrix
implicit none
real,dimension (3,3) :: mat1,mat2
real,dimension(18) :: mat3
integer i
mat1=reshape( (/1,2,3,4,5,6,7,8,9/),(/3,3/))
mat2=reshape( (/1,2,3,4,5,6,7,8,9/),(/3,3/))
mat3=[mat1,mat2]
print*, shape([mat1,mat2]) !check shape of concatenated array
!display
do i=1,18,1
write(*,10) mat3(i)
10 format(F10.4)
end do
end program
However, the result you wanted can be achieved using following code
program matrix
implicit none
real,dimension (3,3) :: mat1,mat2
real,dimension(3,6) :: mat3
integer i
mat1=reshape( (/1,2,3,4,5,6,7,8,9/),(/3,3/))
mat2=reshape( (/1,2,3,4,5,6,7,8,9/),(/3,3/))
do i=1,3
mat3(i,:)=[mat1(:,i),mat2(:,i)]
enddo
!display
do i=1,3,1
write(*,*) mat3(i,:)
end do
end program
Another way could simply be to
mat3(:,1:3) = mat1
mat3(:,4:6) = mat2
I dont know which is faster, this or the do loop above...
Fill it using 1-D arrays then reshape your mat3.