Padding an array with zeroes in fortran without using loops - fortran

I have two arrays, and I want to compare their sizes and add trailing zeroes to whichever array is shorter.
eg- For arrays -
y1=(/ 1,2,3 /)
y2=(/ 1,2,3,4,5 /)
The final result should be -
y1=(/ 1,2,3,0,0 /)
y2=(/ 1,2,3,4,5 /)
I am very new to Fortran, and from what I know till now, this can be done like this:-
integer, allocatable :: y1(:),y2(:)
integer :: l1,l2,i
.
.
.
! some code to generate y1 and y2 here
.
.
.
l1=size(y1)
l2=size(y2)
if (l1>l2) then
do i=l2+1,l1
y2(i)=0
enddo
else if (l2>l1) then
do i=l1+1,l2
y1(i)=0
enddo
endif
I want to know if there is a better way of doing this, preferably one that doesn't involve loops, since the actual problem I am working on might have huge vectors

Here's one way:
y1 = RESHAPE(y1,SHAPE(y2),pad=[0])
No explicit loops. As #VladimirF commented the shorter array has to be re-allocated, this approach leaves it to the compiler and the run-time to take care of that.
If you are concerned about the performance of this approach, or concerned about its performance wrt a version using explicit loops, and concerned about how the performance scales with the sizes of arrays, then run some tests. I wouldn't be surprised to find that explicit reallocation and a loop or two are faster than this 'clever' approach.

If you are only concerned with rank 1 arrays, using SHAPE and RESHAPE is overkill. Simply use Fortran's array constructor features. You also can use modern Fortran's allocation-on-assignment feature, so you do not need to re-allocate the shorter array.
program foo
implicit none
integer, allocatable :: y1(:),y2(:)
integer :: l1,l2,i
y1 = [1, 2, 3]
y2 = [1, 2, 3, 4, 5]
l1 = size(y1)
l2 = size(y2)
if (l1 > l2) y2 = [y2, [(0,i=1,l1-l2)]]
if (l2 > l1) y1 = [y1, [(0,i=1,l2-l1)]]
print '(10(I0,1X))', y1
print '(10(I0,1X))', y2
end program foo

Related

Fortan code for Monte Carlo Integration within boundary point a and b

I understand Monte carlo simulation is for estimating area by plotting random points and calculating the ration between the points outside the curve and inside the curve.
I have well calculated the value of pi assuming radius of curve to be unity.
Here is the code
program pi
implicit none
integer :: count, n, i
real :: r, x, y
count = 0
n=500
CALL RANDOM_SEED
DO i = 1, n
CALL RANDOM_NUMBER(x)
CALL RANDOM_NUMBER(y)
IF (x*x + y*Y <1.0) count = count + 1
END DO
r = 4 * REAL(count)/n
print *, r
end program pi
But to find integration , Textbook says to apply same idea. But I'm lost on How to write a code if I want to find the integration of
f(x)=sqrt(1+x**2) over a = 1 and b = 5
Before when radius was one, I did assume point falling inside by condition x*2+y**2 but How can I solve above one?
Any help is extremely helpful
I will write the code first and then explain:
Program integral
implicit none
real f
integer, parameter:: a=1, b=5, Nmc=10000000 !a the lower bound, b the upper bound, Nmc the size of the sampling (the higher, the more accurate the result)
real:: x, SUM=0
do i=1,Nmc !Starting MC sampling
call RANDOM_NUMBER(x) !generating random number x in range [0,1]
x=a+x*(b-a) !converting x to be in range [a,b]
SUM=SUM+f(x) !summing all values of f(x). EDIT: SUM is also an instrinsic function in Fortran so don't call your variable this, I named it so, to illustrate its purpose
enddo
print*, (b-a)*(SUM/Nmc) !final result of your integral
end program integral
function f(x) !defining your function
implicit none
real, intent(in):: x
real:: f
f=sqrt(1+x**2)
end function f
So what's happening:
The integral can be written as
. where:
(this g(x) is a uniform probability distribution of the variable x in [a,b]). And we can write the integral as:
where .
So, finally, we get that the integral should be:
So, all you have to do is generate a random number in the range [a,b] and then calcualte the value of your function for this x. Then do this lots of times (Nmc times), and calculate the sum. Then just divide with Nmc, to find the average and then multiply with (b-a). And this is what the code does.
There's lots of stuff on the internet for this. here's one example that visualizes it pretty nice
EDIT: Second way, that is the same as the Pi method:
Nin=0 !Number of points inside the function (under the curve)
do i=1,Nmc
call random_number(x)
call random_number(y)
x=a+x*(b-a)
y=f_min+y(f_max-f_min)
if (f(x)<y) Nin=Nin+1
enddo
print*, (f_max-f_min)*(b-a)*(real(Nin)/Nmc)
All of this, you could then enclose it in an outer do loop summing the (f_max-f_min)(b-a)(real(Nin)/Nmc) and in the end printing its average.
For this example, what you do is essentially creating an enclosing box from a to b (x dimension) and from f_min to f_max (y dimension) and then doing a sampling of points inside this area and counting the points that are in the function (Nin).Obviously you will have to know the minimum (f_min) and maximum (f_max) value of your function in the range [a,b]. Alternatively you could use arbitrarily low/high values for your f_min f_max but then you will be wasting a lot of points and your error will be bigger.

Is it possible to use vector arguments in Fortran IF statements rather than scalar arguments

Rather than using three consecutive scalar IF statements I would like to use a single IF statement with a vector argument, if this is possible. I can't figure out how.
The reason for wanting this is to test it for speed. My code can run for days calling this section billions of times. Even a little speed up can make a large difference.
Here is the working code with three IF statements for a dummy scenario.
program main
!==============================================
! Define variables
!==============================================
real, dimension(10,3) :: r ! 10 atoms each with x,y,z coordinates
real, dimension(3) :: rij ! x,y,z vector of difference between two atoms
real :: Box_Length ! length of simulation box
real :: time, timer_start, timer_end
integer :: timer
!=======================================================
! Begin Program body
!=======================================================
Box_Length = 1.0 ! based on a box of length = 1 since coords are randomly generated between 0 and 1
!=================================
! Generate random atom coordinates
!=================================
r = 0.0
CALL RANDOM_NUMBER (r)
!=================================
! Begin algorithm
!=================================
call cpu_time(timer_start)
do timer = 1,30000
do i = 1,size(r)
do j = 1, size(r)
if(i == j) cycle
rij(:) = abs(r(i,:) - r(j,:))
!==============================
! Apply mirror image convention
!==============================
if(rij(1) > Box_Length - rij(1) ) rij(1) = rij(1) - Box_Length
if(rij(2) > Box_Length - rij(2) ) rij(2) = rij(2) - Box_Length
if(rij(3) > Box_Length - rij(3) ) rij(3) = rij(3) - Box_Length
!*******************************************************************************
! Question: Can I make it into a single if statement i.e. *
! *
! if(rij(:) > Box_Length(:) - rij(:) ) rij(:) = rij(:) - Box_Length(:) *
! *
! Where Box_Length is now a vector and only the coordinate that triggers *
! the if statement is modified. Meaning that if { rij(2) > Box_Length - rij(2) } *
! only rij(2) is modified, not all three. *
! I have tried making Box_Length a vector, but that failed. *
!*******************************************************************************
! insert rest of algorithm
enddo ! j-loop
enddo ! i loop
enddo ! timer loop
call cpu_time(timer_end)
time = timer_end - timer_start
print*, 'Time taken was: ', time
end program main
Thanks for any help on turning this into a vectorized IF statement. Also, I flip back and forth between column and row vectors. Currently column vectors are working faster for me. This IS NOT a question about column versus row vectors. I do my own timing and use the faster method. I simply can't get a working vector method to try timing against.
"if(rij(:) > Box_Length(:) - rij(:) ) rij(:) = rij(:) - Box_Length(:)"
can be
where (rij > Box_Length - rij) rij = rij - Box_Length
Not that it will not make it faster than an explicit DO loop, it is just a shorter way to write it. It can even make it slower, because temporary array may be used or the compiler may have hard time to vectorize it - in the SIMD vectorization sense.
I advise against using word "vectorization" to speak about the shorthand array notation in Fortran. In Fortran vectorization normally means using of the SIMD CPU instructions. The compiler call that vectorization. Your notion of vectorization comes from Python but is not used in Fortran an is misleading to other readers.
Also read https://software.intel.com/en-us/blogs/2008/03/31/doctor-it-hurts-when-i-do-this to see why you should use just rij and not rij(:).
TLDR: It is possible to write it on one line, but in Fortran array notation is NOT the way to make program faster. Often it has an opposite effect.

Sum of product: can we vectorize the following in C++? (using Eigen or other libraries)

UPDATE: the (sparse) three-dimensional matrix v in my question below is symmetric: v(i1,i2,i3) = v(j1,j2,j3) where (j1,j2,j3) is any of the 6 permutations of (i1,i2,i3), i.e.
v(i1,i2,i3) = v(i1,i3,i2) = v(i2,i3,i1) = v(i2,i1,i3) = v(i3,i1,i2) = v(i3,i2,i1).
Moreover, v(i1,i2,i3) != 0 only when i1 != i2 && i1 != i3 && i2 != i3.
E.g. v(i,i,j) = 0, v(i, k, k) = 0, v(k, j, k) = 0, etc...
I thought that with this additional information, I could already get a significant speed-up by doing the following:
Remark: v contains duplicate values (a triplet (i,j,k) has 6 permutations, and the values of v for these 6 are the same).
So I defined a more compact matrix uthat contains only non-duplicates of v. The indices of u are (i1,i2,i3) where i1 < i2 < i3. The length of u is equal to the length of v divided by 6.
I computed the sum by iterating over the new value vector and the new index vectors.
With this, I only got a little speed-up. I realized that instead of iterating N times doing a multiplication each time, I iterated N/6 times doing 6 multiplications each time, and that's pretty much the same as before :(
Hope somebody could come up with a better solution.
--- (Original question) ---
In my program I have an expensive operation that is repeated every iteration.
I have three n-dimensional vectors x1, x2 and x3 that are supposed to change every iteration.
I have four N-dimensional vectors I1, I2, I3 and v that are pre-defined and will not change, where:
I1, I2 and I3 contain the indices of respectively x1, x2 and x3 (the elements in I_i are between 0 and n-1)
v is a vector of values.
For example:
We can see v as a (reshaped) sparse three-dimensional matrix, each index k of v corresponds to a triplet (i1,i2,i3) of indices of x1, x2, x3.
I want to compute at each iteration three n-dimensional vectors y1, y2 and y3 defined by:
y1[i1] = sum_{i2,i3} v(i1,i2,i3)*x2(i2)*x3(i3)
y2[i2] = sum_{i1,i3} v(i1,i2,i3)*x1(i1)*x3(i3)
y3[i3] = sum_{i1,i2} v(i1,i2,i3)*x1(i1)*x2(i2)
More precisely what the program does is:
Repeat:
Compute y1 then update x1 = f(y1)
Compute y2 then update x2 = f(y2)
Compute y3 then update x3 = f(y3)
where f is some external function.
I would like to know if there is a C++ library that helps me to do so as fast as possible. Using for loops is just too slow.
Thank you very much for your help!
Update: Looks like it's not easy to get a better solution than the straight-forward for loops. If the vector of indices I1 above is ordered in non-decreasing order, can we compute y1 faster?
For example: I1 = [0 0 0 0 1 1 2 2 2 3 3 3 ... n n].
The simple answer is no, at least, not trivially. Your access pattern (e.g. x2(i2)*x3(i3)) does not (at least at compile time) access contiguous memory, but rather has a layer of indirection. Due to this, SIMD instructions are pretty useless, as they work on chunks of memory. What you may want to consider doing is creating a copy of xM sorted according to iM, removing the layer of indirection. This should reduce the number of cache misses in that xM(iM) generates and since it's accessed N times, that may reduce some of the wall time (assuming N is large).
If maximal accuracy is not critical, you may want to consider using a FFT method instead of the convolution (at least, that's how I understood your question. Feel free to correct me if I'm wrong).
Assuming you are doing a convolution and the vectors (a and b, same size as in your question) are large, the result (c) can be calculated naïvely as
// O(n^2)
for(int i = 0; i < c.size(); i++)
c(i) = a(i) * b.array();
Using the convolution theorem, you could take the Fourier transform of both a and b and perform an element wise multiplication and then take the inverse Fourier transform of the result to get c (will probably differ a little):
// O(n log(n)); note A, B, and C are vectors of complex floating point numbers
fft.fwd(a, A);
fft.fwd(b, B);
C = A.array() * B.array();
fft.inv(C, c);

Secant method, negative answers [duplicate]

I am trying to write a program to solve for pipe diameter for a pump system I've designed. I've done this on paper and understand the mechanics of the equations. I would appreciate any guidance.
EDIT: I have updated the code with some suggestions from users, still seeing quick divergence. The guesses in there are way too high. If I figure this out I will update it to working.
MODULE Sec
CONTAINS
SUBROUTINE Secant(fx,xold,xnew,xolder)
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP), PARAMETER:: gamma=62.4
REAL(DP)::z,phead,hf,L,Q,mu,rho,rough,eff,pump,nu,ppow,fric,pres,xnew,xold,xolder,D
INTEGER::I,maxit
INTERFACE
omitted
END INTERFACE
Q=0.0353196
Pres=-3600.0
z=-10.0
L=50.0
mu=0.0000273
rho=1.940
nu=0.5
rough=0.000005
ppow=412.50
xold=1.0
xolder=0.90
D=11.0
phead = (pres/gamma)
pump = (nu*ppow)/(gamma*Q)
hf = phead + z + pump
maxit=10
I = 1
DO
xnew=xold-((fx(xold,L,Q,hf,rho,mu,rough)*(xold-xolder))/ &
(fx(xold,L,Q,hf,rho,mu,rough)-fx(xolder,L,Q,hf,rho,mu,rough)))
xolder = xold
xold = xnew
I=I+1
WRITE(*,*) "Diameter = ", xnew
IF (ABS(fx(xnew,L,Q,hf,rho,mu,rough)) <= 1.0d-10) THEN
EXIT
END IF
IF (I >= maxit) THEN
EXIT
END IF
END DO
RETURN
END SUBROUTINE Secant
END MODULE Sec
PROGRAM Pipes
USE Sec
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP)::xold,xolder,xnew
INTERFACE
omitted
END INTERFACE
CALL Secant(f,xold,xnew,xolder)
END PROGRAM Pipes
FUNCTION f(D,L,Q,hf,rho,mu,rough)
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP), PARAMETER::pi=3.14159265d0, g=9.81d0
REAL(DP), INTENT(IN)::L,Q,rough,rho,mu,hf,D
REAL(DP)::f, fric, reynold, coef
fric=(hf/((L/D)*(((4.0*Q)/(pi*D**2))/2*g)))
reynold=((rho*(4.0*Q/pi*D**2)*D)/mu)
coef=(rough/(3.7d0*D))
f=(1/SQRT(fric))+2.0d0*log10(coef+(2.51d0/(reynold*SQRT(fric))))
END FUNCTION
You very clearly declare the function in the interface (and the implementation) as
FUNCTION f(L,D,Q,hf,rho,mu,rough)
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP), PARAMETER::pi=3.14159265, g=9.81
REAL(DP), INTENT(IN)::L,Q,rough,rho,mu,hf,D
REAL(DP)::fx
END FUNCTION
So you need to pass 7 arguments to it. And none of them are optional.
But when you call it, you call it as
xnew=xold-fx(xold)*((xolder-xold)/(fx(xolder)-fx(xold))
supplying a single argument to it. When you try to compile it with gfortran for example, the compiler will complain for not getting any argument for D (the second dummy argument), because it stops with the first error.
It seems that the initial values for xold and xolder are too far from the solution. If we change them as
xold = 3.0d-5
xolder = 9.0d-5
and changing the threshold for convergence more tightly as
IF (ABS(fx(xnew,L,Q,hf,rho,mu,rough)) <= 1.0d-10) THEN
then we get
...
Diameter = 7.8306011049894322E-005
Diameter = 7.4533171406818087E-005
Diameter = 7.2580746283970710E-005
Diameter = 7.2653611474296094E-005
Diameter = 7.2652684750264582E-005
Diameter = 7.2652684291155581E-005
Here, we note that the function f(x) is defined as
FUNCTION f(D,L,Q,hf,rho,mu,rough)
...
f = (1/(hf/((L/D)*((4*Q)/pi*D)))) !! (1)
+ 2.0 * log( (rough/(3.7*D)) + (2.51/(((rho*((4*Q)/pi*D))/mu) !! (2)
* (hf/((L/D)*((4*Q)/pi*D))))) !! (3)
)
END FUNCTION
where terms in Lines (1) and (3) are both constant, while terms in Line (2) are some constants over D. So, we see that f(D) = c1 - 2.0 * log( D / c2 ), so we can obtain the solution analytically as D = c2 * exp(c1/2.0) = 7.26526809959e-5, which agrees well with the numerical solution above. To get a rough idea of where the solution is, it is useful to plot f(D) as a function of D, e.g. using Gnuplot.
But I am afraid that the expression for f(D) itself (given in the Fortran code) might include some typo due to many parentheses. To avoid such issues, it is always useful to first arrange the expression for f(D) as simplest as possible before making a program.
(One TIP is to extract constant factors outside and pre-calculate them.)
Also, for debugging purposes it is sometimes useful to check the consistency of physical dimensions and physical units of various terms. Indeed, if the magnitude of the obtained solution is too large or too small, there might be some problem of conversion factors for physical units, for example.

Reduction on array in FORTRAN

I'm trying to parallelize a module in my FORTRAN code using OpenMP and I'm running into some issues with threads overwriting updated values in the array. Obviously my first instinct was to do a reduction, but I'm not really sure how to go about it in this context, as I've only done it in a simple x = x + update kind of situation, where-as this is similar, but does so in a normally out-of-order fashion, and also in an array.
subroutine chargeInterp(q,x,xmin,xmax,dg,np,ng)
real(kind = 8) :: charge, dg, xmin, weight, xmax,wp
integer :: g1,g2,g1temp,g2temp,i,np,ng
real(kind = 8), dimension(np) :: q,x
!$OMP PARALLEL DO PRIVATE(g1,g2) REDUCTION(+:q)
do i=1,np
g1 = floor((x(i)-xmin)/dg)
g2 = g1 + 1
wp=((x(i)-xmin)/dg-g1)
weight=1-wp
q(g1+1) = q(g1+1) - weight
q(g2+1) = q(g2+1) - wp
enddo
!$OMP END PARALLEL DO
Just to give a rundown of what it's doing, essentially it's taking the position of a particle and weighting its charge onto adjacent grid points on the mesh.
Thanks for the help!
P.S. The omp statements wrapped around the loop don't work. Just throwing that one out there. Have also tried !$OMP ATOMIC before updating q. Compiles and runs, but my results don't match my un-parallelized results, so it's a no-go.