Is there a simple and quick way to multiply a column of a matrix with element of a vector. We can do this explicitly,
program test
integer :: x(3,3), y(3), z(3,3)
x = reshape([(i,i=1,9)],[3,3])
y = [1,2,3]
do i=1,3
z(:,i) = x(:,i) * y(i)
print *, z(:,i)
enddo
end program test
Is there a way to perform the do loop in one line. For example in Numpy python we can do this to do the job in one shot
z = np.einsum('ij,i->ij',x,y)
#or
z = x*y[:,None]
Try
z = x * spread(y,1,3)
and if that doesn't work (no Fortran on this computer so I haven't checked) fiddle around with spread until it does. In practice you'll probably want to replace the 3 by size(x,1) or suchlike.
I expect that this will cause the compiler to create temporary arrays. And I expect it will be easy to find situations where it underperforms the explicit looping scheme in the question. 'neat' one-liners often have a cost in both time and space. And often tried-and-trusted Fortran approach of explicit looping is the one to go with.
Why replace clear easy to read code with garbage?
program test
implicit none
integer i,j
integer :: x(3,3), y(3), z(3,3)
x = reshape([(i,i=1,9)],[3,3])
y = [1,2,3]
z = reshape ([((x(j,i)*y(i) ,j=1,3),i=1,3)], [3,3])
print *, z(1,:)
print *, z(2,:)
print *, z(3,:)
end program test
Related
I am converting f77 code to f90 code, and part of the code needs to sum over elements of a 3d matrix. In f77 this was accomplished by using 3 loops (over outer,middle,inner indices). I decided to use the f90 intrinsic sum (3 times) to accomplish this, and much to my surprise the answers differ. I am using the ifort compiler, have debugging, check-bounds, no optimization all turned on
Here is the f77-style code
r1 = 0.0
do k=1,nz
do j=1,ny
do i=1,nx
r1 = r1 + foo(i,j,k)
end do
end do
end do
and here is the f90 code
r = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)
I have tried all sorts of variations, such as swapping the order of the loops for the f77 code, or creating temporary 2D matrices and 1D arrays to "reduce" the dimensions while using SUM, but the explicit f77 style loops always give different answers from the f90+ SUM function.
I'd appreciate any suggestions that help understand the discrepancy.
By the way this is using one serial processor.
Edited 12:13 pm to show complete example
! ifort -check bounds -extend-source 132 -g -traceback -debug inline-debug-info -mkl -o verify verify.f90
! ./verify
program verify
implicit none
integer :: nx,ny,nz
parameter(nx=131,ny=131,nz=131)
integer :: i,j,k
real :: foo(nx,ny,nz)
real :: r0,r1,r2
real :: s0,s1,s2
real :: r2Dfooxy(nx,ny),r1Dfoox(nx)
call random_seed
call random_number(foo)
r0 = 0.0
do k=1,nz
do j=1,ny
do i=1,nx
r0 = r0 + foo(i,j,k)
end do
end do
end do
r1 = 0.0
do i=1,nx
do j=1,ny
do k=1,nz
r1 = r1 + foo(i,j,k)
end do
end do
end do
r2 = 0.0
do j=1,ny
do i=1,nx
do k=1,nz
r2 = r2 + foo(i,j,k)
end do
end do
end do
!*************************
s0 = 0.0
s0 = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)
s1 = 0.0
r2Dfooxy = SUM(foo, DIM = 3)
r1Dfoox = SUM(r2Dfooxy, DIM = 2)
s1 = SUM(r1Dfoox)
s2 = SUM(foo)
!*************************
print *,'nx,ny,nz = ',nx,ny,nz
print *,'size(foo) = ',size(foo)
write(*,'(A,4(ES15.8))') 'r0,r1,r2 = ',r0,r1,r2
write(*,'(A,3(ES15.8))') 'r0-r1,r0-r2,r1-r2 = ',r0-r1,r0-r2,r1-r2
write(*,'(A,4(ES15.8))') 's0,s1,s2 = ',s0,s1,s2
write(*,'(A,3(ES15.8))') 's0-s1,s0-s2,s1-s2 = ',s0-s1,s0-s2,s1-s2
write(*,'(A,3(ES15.8))') 'r0-s1,r1-s1,r2-s1 = ',r0-s1,r1-s1,r2-s1
stop
end
!**********************************************
sample output
nx,ny,nz = 131 131 131
size(foo) = 2248091
r0,r1,r2 = 1.12398225E+06 1.12399525E+06 1.12397238E+06
r0-r1,r0-r2,r1-r2 = -1.30000000E+01 9.87500000E+00 2.28750000E+01
s0,s1,s2 = 1.12397975E+06 1.12397975E+06 1.12398225E+06
s0-s1,s0-s2,s1-s2 = 0.00000000E+00-2.50000000E+00-2.50000000E+00
r0-s1,r1-s1,r2-s1 = 2.50000000E+00 1.55000000E+01-7.37500000E+00
First, welcome to StackOverflow. Please take the tour! There is a reason we expect a Minimal, Complete, and Verifiable example because we look at your code and can only guess at what might be the case and that is not too helpful for the community.
I hope the following suggestions helps you figure out what is going on.
Use the size() function and print what Fortran thinks are the sizes of the dimensions as well as printing nx, ny, and nz. As far as we know, the array is declared bigger than nx, ny, and nz and these variables are set according to the data set. Fortran does not necessarily initialize arrays to zero depending on whether it is a static or allocatable array.
You can also try specifying array extents in the sum function:
r = Sum(foo(1:nx,1:ny,1:nz))
If done like this, at least we know that the sum function is working on the exact same slice of foo that the loops loop over.
If this is the case, you will get the wrong answer even though there is nothing 'wrong' with the code. This is why it is particularly important to give that Minimal, Complete, and Verifiable example.
I can see the differences now. These are typical rounding errors from adding small numbers to a large sum. The processor is allowed to use any order of the summation it wants. There is no "right" order. You cannot really say that the original loops make the "correct" answer and the others do not.
What you can do is to use double precision. In extreme circumstances there are tricks like the Kahan summation but one rarely needs that.
Addition of a small number to a large sum is imprecise and especially so in single precision. You still have four significant digits in your result.
One typically does not use the DIM= argument, that is used in certain special circumstances.
If you want to sum all elements of foo, use just
s0 = SUM(foo)
That is enough.
What
s0 = SUM(SUM(SUM(foo, DIM=3), DIM=2), DIM=1)
does is that it will make a temporary 2D arrays with each element be the sum of the respective row in the z dimension, then a 1D array with each element the sum over the last dimension of the 2D array and then finally the sum of that 1D array. If it is done well, the final result will be the same, but it well eat a lot of CPU cycles.
The sum intrinsic function returns a processor-dependant approximation to the sum of the elements of the array argument. This is not the same thing as adding sequentially all elements.
It is simple to find an array x where
summation = x(1) + x(2) + x(3)
(performed strictly left to right) is not the best approximation for the sum treating the values as "mathematical reals" rather than floating point numbers.
As a concrete example to look at the nature of the approximation with ifort, we can look at the following program. We need to enable optimizations here to see effects; the importance of order of summation is apparent even with optimizations disabled (with -O0 or -debug).
implicit none
integer i
real x(50)
real total
x = [1.,(EPSILON(0.)/2, i=1, SIZE(x)-1)]
total = 0
do i=1, SIZE(x)
total = total+x(i)
print '(4F17.14)', total, SUM(x(:i)), SUM(DBLE(x(:i))), REAL(SUM(DBLE(x(:i))))
end do
end program
If adding up in strict order we get 1., seeing that anything smaller in magnitude than epsilon(0.) doesn't affect the sum.
You can experiment with the size of the array and order of its elements, the scaling of the small numbers and the ifort floating point compilation options (such as -fp-model strict, -mieee-fp, -pc32). You can also try to find an example like the above using double precision instead of default real.
So I am using the taylor series to calculate sin(0.75) in fortran 90 up until a certain point, so I need to run it in a do while loop (until my condition is met). This means I will need to use a factorial, here's my code:
program taylor
implicit none
real :: x = 0.75
real :: y
integer :: i = 3
do while (abs(y - sin(0.75)) > 10.00**(-7))
i = i + 2
y = x - ((x**i)/fact(i))
print *, y
end do
end program taylor
Where i've written fact(i) is where i'll need the factorial. Unfortunately, Fortran doesn't have an intrinsic ! function. How would I implement the function in this program?
Thanks.
The following simple function answers your question. Note how it returns a real, not an integer. If performance is not an issue, then this is fine for the Taylor series.
real function fact(n)
integer, intent(in) :: n
integer :: i
if (n < 0) error stop 'factorial is singular for negative integers'
fact = 1.0
do i = 2, n
fact = fact * i
enddo
end function fact
But the real answer is that Fortran 2008 does have an intrinsic function for the factorial: the Gamma function. For a positive integer n, it is defined such that Gamma(n+1) == fact(n).
(I can imagine the Gamma function is unfamiliar. It's a generalization of the factorial function: Gamma(x) is defined for all complex x, except non-positive integers. The offset in the definition is for historical reasons and unnecessarily confusing it you ask me.)
In some cases you may want to convert the output of the Gamma function to an integer. If so, make sure you use "long integers" via INT(Gamma(n+1), kind=INT64) with the USE, INTRINSIC :: ISO_Fortran_env declaration. This is a precaution against factorials becoming quite large. And, as always, watch out for mixed-mode arithmetic!
Here's another method to compute n! in one line using only inline functions:
product((/(i,i=1,n)/))
Of course i must be declared as an integer beforehand. It creates an array that goes from 1 to n and takes the product of all components. Bonus: It even works gives the correct thing for n = 0.
You do NOT want to use a factorial function for your Taylor series. That would meant computing the same terms over and over. You should just multiply the factorial variable in each loop iteration. Don't forget to use real because the integer will overflow quickly.
See the answer under the question of your schoolmate Program For Calculating Sin Using Taylor Expansion Not Working?
Can you write the equation which gives factorial?
It may look something like this
PURE FUNCTION Bang(N)
IMPLICIT NONE
INTEGER, INTENT(IN) :: N
INTEGER :: I
INTEGER :: Bang
Bang = N
IF(N == 2) THEN
Bang = 2
ELSEIF(N == 1) THEN
Bang = 1
ELSEIF(N < 1) THEN
WRITE(*,*)'Error in Bang function N=',N
STOP
ELSE
DO I = (N-1), 2, -1
Bang = Bang * I
ENDDO
ENDIF
RETURN
END FUNCTION Bang
I have found many questions that turn around this issue, but none that directly answer the question:
-in fortran, what are (a) the fastest (wall clock) and (b) the most elegant (concise and clear) way to eliminate duplicates from a list of integers
There has to be a better way than my feeble attempt:
Program unique
implicit none
! find "indices", the list of unique numbers in "list"
integer( kind = 4 ) :: kx, list(10)
integer( kind = 4 ),allocatable :: indices(:)
logical :: mask(10)
!!$ list=(/3,2,5,7,3,1,4,7,3,3/)
list=(/1,(kx,kx=1,9)/)
mask(1)=.true.
do kx=10,2,-1
mask(kx)= .not.(any(list(:kx-1)==list(kx)))
end do
indices=pack([(kx,kx=1,10)],mask)
print *,indices
End Program unique
My attempt expects the list to be ordered, but it would be better if that requirement were lifted
I just couldn't help myself, so I wrote up an answer you may enjoy. The following code will return an array of unique values in ascending order for an input array of unsorted integers. Note that the output results are the actual values, not just the indices.
program unique_sort
implicit none
integer :: i = 0, min_val, max_val
integer, dimension(10) :: val, unique
integer, dimension(:), allocatable :: final
val = [ 3,2,5,7,3,1,4,7,3,3 ]
min_val = minval(val)-1
max_val = maxval(val)
do while (min_val<max_val)
i = i+1
min_val = minval(val, mask=val>min_val)
unique(i) = min_val
enddo
allocate(final(i), source=unique(1:i)) !<-- Or, just use unique(1:i)
print "(10i5:)", final
end program unique_sort
! output: 1 2 3 4 5 7
See this gist for timing comparisons between (unique_sort) above, your example (unique_indices), and the example at Rosetta Code (remove_dups) as well as a couple of variations. I'd like to test #High Performance Mark's code but haven't yet.
Run program 1,000,000 times, 100 integers 0<=N<=50
- unique_sort t~2.1 sec input: unsorted, w/duplicates output: sorted unique values
- remove_dup t~1.4 input: unsorted, w/duplicates output: unsorted unique values
- unique_indices t~1.0 input: sorted, w/duplicates output: unsorted indices for unique values
- BONUS!(Python) t~4.1 input: unsorted, w/duplicates output: sorted unique values
Bottom line: on my machine (i7 8GB laptop) unique_indices is slightly faster than remove_dups. However, remove_dups does not require the input array to be pre-sorted, and actually returns the values rather than the indices (see the gist for a modified version of unique_indices that returns the values instead, which doesn't seem to slow it down much at all).
On the other hand, unique_sort takes around twice as long, but is designed to handle unsorted input, and also returns the values in sorted order, in 8 LOC (minus the var declarations). So that seems a fair trade-off. Anywho, I'm sure unique_sort can be optimized for greater speed using some sort of masking statement, but that's for another day.
Update
The timings shown above were obtained from a test program where each subroutine was placed in a module and executed via a procedure call. However, I found a surprisingly large improvement in performance when unique_sort was placed directly in the main program, completing in only ~0.08 sec for 1 million runs. A speedup of ~25x simply by not using a procedure seems strange to me - ordinarily, I assume that the compiler optimizes the cost of procedure calls away. For example, I found no difference in performance for remove_dup or unique_indices whether they were executed via a procedure or placed directly in the main program.
After #VladimirF pointed out that I was overcomparing, I found I could vectorize my original code (remove the do loop do kx....). I have coupled the "unique" function with a mergesort algorithm loosely based on wikipedia. The guts are contained in module SortUnique
Module SortUnique
contains
Recursive Subroutine MergeSort(temp, Begin, Finish, list)
! 1st 3 arguments are input, 4th is output sorted list
implicit none
integer(kind=4),intent(inout) :: Begin,list(:),temp(:)
integer(kind=4),intent(in) :: Finish
integer(kind=4) :: Middle
if (Finish-Begin<2) then !if run size =1
return !it is sorted
else
! split longer runs into halves
Middle = (Finish+Begin)/2
! recursively sort both halves from list into temp
call MergeSort(list, Begin, Middle, temp)
call MergeSort(list, Middle, Finish, temp)
! merge sorted runs from temp into list
call Merge(temp, Begin, Middle, Finish, list)
endif
End Subroutine MergeSort
Subroutine Merge(list, Begin, Middle, Finish, temp)
implicit none
integer(kind=4),intent(inout) :: list(:),temp(:)
integer(kind=4),intent(in) ::Begin,Middle,Finish
integer(kind=4) :: kx,ky,kz
ky=Begin
kz=Middle
!! While there are elements in the left or right runs...
do kx=Begin,Finish-1
!! If left run head exists and is <= existing right run head.
if (ky.lt.Middle.and.(kz.ge.Finish.or.list(ky).le.list(kz))) then
temp(kx)=list(ky)
ky=ky+1
else
temp(kx)=list(kz)
kz = kz + 1
end if
end do
End Subroutine Merge
Function Unique(list)
!! usage sortedlist=Unique(list)
implicit none
integer(kind=4) :: strt,fin,N
integer(kind=4), intent(inout) :: list(:)
integer(kind=4), allocatable :: unique(:),work(:)
logical,allocatable :: mask(:)
! sort
work=list;strt=1;N=size(list);fin=N+1
call MergeSort(work,strt,fin,list)
! cull duplicate indices
allocate(mask(N));
mask=.false.
mask(1:N-1)=list(1:N-1)==list(2:N)
unique=pack(list,.not.mask)
End Function Unique
End Module SortUnique
Program TestUnique
use SortUnique
implicit none
! find "indices", the list of unique numbers in "list"
integer (kind=4),allocatable :: list(:),newlist(:)
integer (kind=4) :: kx,N=100000 !N even
real (kind=4) :: start,finish,myrandom
allocate(list(N))
do kx=1,N
call random_number(myrandom)
list(kx)=ifix(float(N)/2.*myrandom)
end do
call cpu_time(start)
newlist=unique(list)
call cpu_time(finish)
print *,"cull duplicates: ",finish-start
print *,"size(newlist) ",size(newlist)
End Program TestUnique
At #HighPerformanceMark 's suggestion, the function is simply invoked as newlist=unique(list). The above is certainly not concise, but it seems clear, and it is about 200 times faster than either my original or the other solutions proposed.
I am now running a program for a certain iterations. The time step is 0.01. I want to write some information when a specific time is reached. For example:
program abc
implicit none
double precision :: time,step,target
integer :: x
time = 0.d0
step = 0.01
target = 5.d0
do x = 1,6000
time = time + step
"some equations here to calculate the model parameters"
if(time.eq.target)then
write(*,*) "model parameters"
endif
enddo
However, "time" never equals to 1.0 or 2.0 or etc. It shows like "0.999999866" instead of "1.0" and "1.99999845" instead of "2.0".
Although I can use integer "x" to define when to write the information, I prefer to use the time step. Also, I may want to change the time step (0.01/0.02/0.05/etc) or target (5.0/6.0/8.0/etc).
Does anyone knows how to fix this? Thanks ahead.
You have now discovered floating point arithmetic! Just ensure that the time is sufficiently close to the target.
if(abs(time-target) < 0.5d0*step ) then
...
should do the trick.
Floating point arithmetic is not perfect and your variables are always exact up to a certain machine error, depending on your variables' number format (32, 64, 128 bit). The following example illustrates well this characteristic:
PROGRAM main
USE, INTRINSIC :: ISO_FORTRAN_ENV, qp => real128
IMPLICIT NONE
REAL(qp) :: a, b, c
a = 128._qp
b = a/120._qp + 1
c = 120._qp*(b-1)
PRINT*, "a = ", a
PRINT*, "c = ", c
END PROGRAM main
Here is the output to this program with gfortran v.4.6.3:
a = 128.00000000000000000
c = 127.99999999999999999
I imagine this is something silly I've missed but I've asked my whole class and noone can seem to work it out. Making a simple program calling in a subroutine and I'm having trouble with the do loop reading in the entries of the matrix.
program Householder_Program
use QR_Factorisation
use numeric_kinds
complex(dp), dimension(:,:), allocatable :: A, Q, R, V
integer :: i, j, n, m
print *, 'Enter how many rows in the matrix A'
read *, m
print *, 'Enter how many columns in the matrix A'
read *, n
allocate(A(m,n), Q(m,n), R(n,n), V(n,n))
do i = 1,m
do j = 1,n
Print *, 'Enter row', i, 'and column', j, 'of matrix A'
read *, A(i,j)
end do
end do
call Householder_Triangularization(A,V,R,n,m)
print *, R
end program
It will ask me for A(1,1) but when I type in a number it will not ask me for A(1,2), it will leave a blank line. When I try to put in a 2nd number it will error and say :
Enter row 1 and column 1 of matrix A
1
2
At line 22 of file HouseholderProgram.f90 (unit = 5, file = 'stdin')
Fortran runtime error: Bad repeat count in item 1 of list input
Your variable A is (an array) of type complex. This means that when you attempt to do the list-directed input of the element values you cannot just specify a single number. So, in your case the problem is not with the program but with the input.
From the Fortran 2008 standard, 10.10.3
When the next effective item is of type complex, the input form consists of a left parenthesis followed by an ordered pair of numeric input fields separated by a comma (if the decimal edit mode is POINT) or semicolon (if the decimal edit mode is COMMA), and followed by a right parenthesis.
Input, then, must be something like (1., 12.).
You are trying to read in complex numbers (A is complex)! As such, you should specify complex numbers to the code... Since you are providing just one integer, the program does not know what to do.
Providing (1,0) and (2,0) instead of 1 and 2 will do the trick.
In case the user input is always real, and you want to read it into a complex type array you can do something like this:
Print *, 'Enter row', i, 'and column', j, 'of matrix A'
read *, dummy
A(i,j)=dummy
where dummy is declared real. This will save the user from the need to key in the parenthesis required for complex numbers. ( The conversion to complex is automatic )