Non Tabular data structure in Fortran - fortran

I would like to build a data structure for non tabular data. I am not sure what is the right way to do that in (modern) Fortran.
I have a data set of houses that includes their location (lat,lon) and price. I have another data of factories that include their location (lat,lon) and the amount of pollution they produce. For each house I need to create a list of factories which are within 5km radius of the house. Not just the number of these factories but the whole (lat,lon,pollution) vectors of these factories. Each house has a different number of factories close to it ranging from zero to about eighty.
MODULE someDefinitions
IMPLICIT NONE
INTEGER, PARAMETER :: N_houses=82390, N_factories=4215
TYPE house
REAL :: lat,lon,price
! a few more fields which are not important here
END TYPE
TYPE factory
REAL :: lat,lon,pollution
! a few more fields which are not important here
END TYPE
Contains
PURE FUNCTION haversine(deglat1,deglon1,deglat2,deglon2) RESULT (dist)
! Some code for computing haversine distance in meters
END FUNCTION haversine
END MODULE someDefinitions
PROGRAM createStructure
USE someDefinitions
IMPLICIT NONE
TYPE(factory), DIMENSION(N_factories) :: factories
TYPE(house), DIMENSION(N_houses) :: houses
INTEGER :: i,j
! more variables definitions as needed
! code to read houses data from the disk
! code to read factories data from the disk
DO i=1,N_houses
DO j=1,N_factories
!here I compute the distance between houses(i) and factories(j)
! If this distance<=5000 I want to add the index j to the list of indices
! associated with house i. How? What is the right data structure to do
! that? some houses have zero factories within 5000 meters from them.
! Some houses have about 80 factories around them. It's unbalanced.
END DO !j
END DO !i
END PROGRAM createStructure
The created structure will then be used in further calculations. A matrix of N_houses x N_factories is way too large to save in memory.
Note: I know Fortran 2008 if that is helpful in any way.

Using too many nested derived types can become tedious. Here is an example using 2D arrays for all data except the required list. This is similar to the K-Nearest Neighbors (KNN) algorithm naively implemented. There may be better algorithms, of course, but the following can be a good start.
program NoStrucyures
implicit none
type listi
real, allocatable :: item(:,:)
end type
integer, parameter :: N_houses=82390, N_factories=4215
real :: houses(N_houses,3)
real :: factories(N_factories,3)
real :: distance(N_factories)
type(listi) :: list(N_houses)
integer :: i, j, k, within5k
! Generating dummy data
call random_number(houses)
call random_number(factories)
houses = houses * 500000
factories = factories * 500000
do i = 1, N_houses
distance = sqrt((houses(i,1)-factories(:,1))**2 + (houses(i,2)-factories(:,2))**2)
within5k = count( distance <= 5000 )
if (within5k > 0) then
allocate(list(i)%item(within5k,3))
k = 0
do j = 1, N_factories
if (distance(j) <= 5000) then
k = k + 1
list(i)%item(k,:) = factories(j,:)
end if
end do
else
list(i)%item = reshape([-1, -1, -1],[1,3])
end if
end do
do i=1,10
print *, list(i)%item
end do
end program NoStrucyures

Related

Computing the Jacobian matrix in Fortran

In Newton's method, to solve a nonlinear system of equations we need to find the Jacobian matrix and the determinant of the inverse of the Jacobian matrix.
Here are my component functions,
real function f1(x,y)
parameter (pi = 3.141592653589793)
f1 = log(abs(x-y**2)) - sin(x*y) - sin(pi)
end function f1
real function f2(x,y)
f2 = exp(x*y) + cos(x-y) - 2
end function f2
For the 2x2 case I am computing the Jacobian matrix and determinant of the inverse of Jacobian matrix like this,
x = [2,2]
h = 0.00001
.
.
! calculate approximate partial derivative
! you can make it more accurate by reducing the
! value of h
j11 = (f1(x(1)+h,x(2))-f1(x(1),x(2)))/h
j12 = (f1(x(1),x(2)+h)-f1(x(1),x(2)))/h
j21 = (f2(x(1)+h,x(2))-f2(x(1),x(2)))/h
j22 = (f2(x(1),x(2)+h)-f2(x(1),x(2)))/h
! calculate the Jacobian
J(1,:) = [j11,j12]
J(2,:) = [j21,j22]
! calculate inverse Jacobian
inv_J(1,:) = [J(2,2),-J(1,2)]
inv_J(2,:) = [-J(2,1),J(1,1)]
DET=J(1,1)*J(2,2) - J(1,2)*J(2,1)
inv_J = inv_J/DET
.
.
How do I in Fortran extend this to evaluate a Jacobian for m functions evaluated at n points?
Here is a more flexible jacobian calculator.
The results with the 2×2 test case are what you expect
arguments (x)
2.00000000000000
2.00000000000000
values (y)
1.44994967586787
53.5981500331442
Jacobian
0.807287239448229 3.30728724371454
109.196300248300 109.196300248300
I check the results against a symbolic calculation for the given inputs of
Console.f90
program Console1
use ISO_FORTRAN_ENV
implicit none
! Variables
integer, parameter :: wp = real64
real(wp), parameter :: pi = 3.141592653589793d0
! Interfaces
interface
function fun(x,n,m) result(y)
import
integer, intent(in) :: n,m
real(wp), intent(in) :: x(m)
real(wp) :: y(n)
end function
end interface
real(wp) :: h
real(wp), allocatable :: x(:), y(:), J(:,:)
! Body of Console1
x = [2d0, 2d0]
h = 0.0001d0
print *, "arguments"
print *, x(1)
print *, x(2)
y = test(x,2,2)
print *, "values"
print *, y(1)
print *, y(2)
J = jacobian(test,x,2,h)
print *, "Jacobian"
print *, J(1,:)
print *, J(2,:)
contains
function test(x,n,m) result(y)
! Test case per original question
integer, intent(in) :: n,m
real(wp), intent(in) :: x(m)
real(wp) :: y(n)
y(1) = log(abs(x(1)-x(2)**2)) - sin(x(1)*x(2)) - sin(pi)
y(2) = exp(x(1)*x(2)) + cos(x(1)-x(2)) - 2
end function
function jacobian(f,x,n,h) result(u)
procedure(fun), pointer, intent(in) :: f
real(wp), allocatable, intent(in) :: x(:)
integer, intent(in) :: n
real(wp), intent(in) :: h
real(wp), allocatable :: u(:,:)
integer :: j, m
real(wp), allocatable :: y1(:), y2(:), e(:)
m = size(x)
allocate(u(n,m))
do j=1, m
e = element(j, m) ! Get kronecker delta for j-th value
y1 = f(x-e*h/2,n,m)
y2 = f(x+e*h/2,n,m)
u(:,j) = (y2-y1)/h ! Finite difference for each column
end do
end function
function element(i,n) result(e)
! Kronecker delta vector. All zeros, except the i-th value.
integer, intent(in) :: i, n
real(wp) :: e(n)
e(:) = 0d0
e(i) = 1d0
end function
end program Console1
I will answer about evaluation in different points. This is quite simple. You just need an array of points, and if the points are in some regular grid, you may not even need that.
You may have an array of xs and array of ys or you can have an array of derived datatype with x and y components.
For the former:
real, allocatable :: x(:), y(:)
x = [... !probably read from some data file
y = [...
do i = 1, size(x)
J(i) = Jacobian(f, x(i), y(i))
end do
If you want to have many functions at the same time, the problem is always in representing functions. Even if you have an array of function pointers, you need to code them manually. A different approach is to have a full algebra module, where you enter some string representing a function and you can evaluate such function and even compute derivatives symbolically. That requires a parser, an evaluator, it is a large task. There are libraries for this. Evaluation of such a derivative will be slow unless further optimizing steps (compiling to machine code) are undertaken.
Numerical evaluation of the derivative is certainly possible. It will slow the convergence somewhat, depending on the order of the approximation of the derivative. You do a difference of two points for the numerical derivative. You can make an interpolating polynomial from values in multiple points to get a higher-order approximation (finite difference approximations), but that costs machine cycles.
Normally we can use auto difference tools as #John Alexiou mentioned. However in practise I prefer using MATLAB to analytically solve out the Jacobian and then use its build-in function fortran() to convert the result to a f90 file. Take your function as an example. Just type these into MATLAB
syms x y
Fval=sym(zeros(2,1));
Fval(1)=log(abs(x-y^2)) - sin(x*y) - sin(pi);
Fval(2)=exp(x*y) + cos(x-y) - 2;
X=[x;y];
Fjac=jacobian(Fval,X);
fortran(Fjac)
which will yield
Fjac(1,1) = -y*cos(x*y)-((-(x-y**2)/abs(-x+y**2)))/abs(-x+y**2)
Fjac(1,2) = -x*cos(x*y)+(y*((-(x-y**2)/abs(-x+y**2)))*2.0D0)/abs(-
&x+y**2)
Fjac(2,1) = -sin(x-y)+y*exp(x*y)
Fjac(2,2) = sin(x-y)+x*exp(x*y)
to you. You just get an analytical Jacobian fortran function.
Meanwhile, it is impossible to solve the inverse of a mxn matrix because of rank mismatching. You should simplify the system of equations to get a nxn Jacobin.
Additionally, when we use Newton-Raphson's method we do not solve the inverse of the Jacobin which is time-consuming and inaccurate for a large system. An easy way is to use dgesv in LAPACK for dense Jacobin. As we only need to solve the vector x from system of linear equations
Jx=-F
dgesv use LU decomposition and Gaussian elimination to solve above system of equations which is extremely faster than solving inverse matrix.
If the system of equations is large, you can use UMFPACK and its fortran interface module mUMFPACK to solve the system of equations in which J is a sparse matrix. Or use subroutine ILUD and LUSOL in a wide-spread sparse matrix library SPARSEKIT2.
In addition to these, there are tons of other methods which try to solve the Jx=-F faster and more accurate such as Generalized Minimal Residual (GMRES) and Stabilized Bi-Conjugate Gradient (BICGSTAB) which is a strand of literature.

Efficient way to calculate distance function

I have a 3D matrix (dimension nx,nz,ny) which corresponds to a physical domain. This matrix contains a continuous field from -1 (phase 1) to +1 (phase 2); the interface between the two phases is the level 0 of this field.
Now, I want to calculate efficiently the signed distance function from the interface for every point in the domain.
I tried two possibilities (sgn is the sign of my field, with values +1,0,-1, xyz contains the grid as triplets of x,y,z at each point and dist is the signed distance function I want to calculate).
double precision, dimension(nx,nz,ny) :: dist,sgn,eudist
integer :: i,j,k
double precision :: seed,posit,tmp(nx)
do j=1,ny
do k=1,nz
do i=1,nx
seed=sgn(i,k,j)
! look for interface
eudist=(xyz(:,:,:,1)-x(i))**2+(xyz(:,:,:,2)-z(k))**2+(xyz(:,:,:,3)-y(j))**2
! find min within mask
posit=minval(eudist,seed*sgn.le.0)
! tmp fits in cache, small speed-up
tmp(i)=-seed*dsqrt(posit)
enddo
dist(:,k,j)=tmp
enddo
enddo
I also tried a second version, which is quite similar to the above one but it calculates the Euclidean distance only in a subset of the whole matrix. With this second version there is some speed up, but it is still too slow. I would like to know whether there is a more efficient way to calculate the distance function.
Second version:
double precision, dimension(nx,nz,ny) :: dist,sgn
double precision, allocatable, dimension(:,:,:) :: eudist
integer :: i,j,k , ii,jj,kk
integer :: il,iu,jl,ju,kl,ku
double precision :: seed, deltax,deltay,deltaz,tmp(nx)
deltax=max(int(nx/4),1)
deltay=max(int(ny/4),1)
deltaz=max(int(nz/2),1)
allocate(eudist(2*deltax+1,2*deltaz+1,2*deltay+1))
do j=1,ny
do k=1,nz
do i=1,nx
! look for closest point in box 2*deltax+1,2*deltaz+1,2*deltay+1
il=max(1,i-deltax)
iu=min(nx,i+deltax)
jl=max(1,j-deltay)
ju=min(ny,j+deltay)
kl=max(1,k-deltaz)
ku=min(nz,k+deltaz)
eudist(:,1:ku-kl+1,:)=(xyz(il:iu,kl:ku,jl:ju,1)-x(i))**2 &
& +(xyz(il:iu,kl:ku,jl:ju,2)-z(k))**2 &
& +(xyz(il:iu,kl:ku,jl:ju,3)-y(j))**2
seed=sgn(i,k,j)
tmp(i)=minval(eudist(:,1:ku-kl+1,:),seed*sgn(il:iu,kl:ku,jl:ju).le.0)
tmp(i)=-seed*dsqrt(tmp(i))
enddo
dist(:,k,j)=tmp
enddo
enddo
eudist: Euclidean distance between the point i,k,j and any other point in a box 2*deltax+1,2*deltaz+1,2*deltay+1 centered in i,k,j. This reduces computational cost, as the distance is calculated only in a subset of the whole grid (here I am assuming that the subset is large enough to contain an interfacial point).
After Vladimir suggestion (x,y,z are the axes determining grid position, xyz(i,k,j)=(x(i),z(k),y(j)) ):
double precision, dimension(nx,nz,ny) :: dist,sgn
double precision :: x(nx), y(ny), z(nz)
double precision, allocatable, dimension(:,:,:) :: eudist
double precision, allocatable, dimension(:) :: xd,yd,zd
integer :: i,j,k , ii,jj,kk
integer :: il,iu,jl,ju,kl,ku
double precision :: seed, deltax,deltay,deltaz,tmp(nx)
deltax=max(int(nx/4),1)
deltay=max(int(ny/4),1)
deltaz=max(int(nz/2),1)
allocate(eudist(2*deltax+1,2*deltaz+1,2*deltay+1))
allocate(xd(2*deltax+1))
allocate(yd(2*deltay+1))
allocate(zd(2*deltaz+1))
do j=1,ny
do k=1,nz
do i=1,nx
! look for closest point in box 2*deltax+1,2*deltaz+1,2*deltay+1
il=max(1,i-deltax)
iu=min(nx,i+deltax)
jl=max(1,j-deltay)
ju=min(ny,j+deltay)
kl=max(1,k-deltaz)
ku=min(nz,k+deltaz)
do ii=1,iu-il+1
xd(ii)=(xyz(il+ii-1)-x(i))**2
end do
do jj=1,ju-jl+1
yd(jj)=(y(jj+jl-1)-y(j))**2
end do
do kk=1,ku-kl+1
zd(kk)=(z(kk+kl-1)-z(k))**2
end do
do jj=1,ju-jl+1
do kk=1,ku-kl+1
do ii=1,iu-il+1
eudist(ii,kk,jj)=xd(ii)+yd(jj)+zd(kk)
enddo
enddo
enddo
seed=sgn(i,k,j)
tmp(i)=minval(eudist(:,1:ku-kl+1,:),seed*sgn(il:iu,kl:ku,jl:ju).le.0)
tmp(i)=-seed*dsqrt(tmp(i))
enddo
dist(:,k,j)=tmp
enddo
enddo
EDIT: more information on the problem at hand.
The grid is an orthogonal grid mapped to a matrix. The number of points of this grid is of the order of 1000 in each direction (in total about 1 billion points).
My goal is switching from a sign function (+1,0,-1) to a signed distance function in the entire grid in an efficient way.
I would still do what I suggested, no matter if you do that on a subset or across the whole plane. Take advantage of the orthogonal grid, it is a great thing to have
do j=1,ny
do k=1,nz
do i=1,nx
! look for closest point in box 2*deltax+1,2*deltaz+1,2*deltay+1
il=max(1,i-deltax)
iu=min(nx,i+deltax)
jl=max(1,j-deltay)
ju=min(ny,j+deltay)
kl=max(1,k-deltaz)
ku=min(nz,k+deltaz)
do ii = il,iu
xd(i) = (xyz(ii,kl:ku,jl:ju,1)-x(i))**2
end do
do jj = jl,ju
yd(i) = (xyz(il:iu,kl:ku,jj,2)-y(j))**2
end do
do kk = kl,ku
zd(k) = (xyz(il:iu,kk,jl:ju,3)-z(k))**2
end do
do jj = jl,ju
do kk = kl,ku
do ii = il,iu
eudist(il:iu,kl:ku,jl:ju) = xd(ii) + yd(jj) + zd(kk)
end do
end do
end do
....
enddo
dist(:,k,j)=tmp
enddo
enddo
Consider separating the whole thing that is inside the outer triple loop into a subroutine or a function. It would not be faster, but it would be much more readable. Especially for us here, It would be enough for us here to only deal with that function, the outer loop is just a confusing extra layer.

Secant method solving for pipe diameter

I am trying to write a program to solve for pipe diameter for a pump system I've designed. I've done this on paper and understand the mechanics of the equations. I would appreciate any guidance.
EDIT: I have updated the code with some suggestions from users, still seeing quick divergence. The guesses in there are way too high. If I figure this out I will update it to working.
MODULE Sec
CONTAINS
SUBROUTINE Secant(fx,xold,xnew,xolder)
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP), PARAMETER:: gamma=62.4
REAL(DP)::z,phead,hf,L,Q,mu,rho,rough,eff,pump,nu,ppow,fric,pres,xnew,xold,xolder,D
INTEGER::I,maxit
INTERFACE
omitted
END INTERFACE
Q=0.0353196
Pres=-3600.0
z=-10.0
L=50.0
mu=0.0000273
rho=1.940
nu=0.5
rough=0.000005
ppow=412.50
xold=1.0
xolder=0.90
D=11.0
phead = (pres/gamma)
pump = (nu*ppow)/(gamma*Q)
hf = phead + z + pump
maxit=10
I = 1
DO
xnew=xold-((fx(xold,L,Q,hf,rho,mu,rough)*(xold-xolder))/ &
(fx(xold,L,Q,hf,rho,mu,rough)-fx(xolder,L,Q,hf,rho,mu,rough)))
xolder = xold
xold = xnew
I=I+1
WRITE(*,*) "Diameter = ", xnew
IF (ABS(fx(xnew,L,Q,hf,rho,mu,rough)) <= 1.0d-10) THEN
EXIT
END IF
IF (I >= maxit) THEN
EXIT
END IF
END DO
RETURN
END SUBROUTINE Secant
END MODULE Sec
PROGRAM Pipes
USE Sec
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP)::xold,xolder,xnew
INTERFACE
omitted
END INTERFACE
CALL Secant(f,xold,xnew,xolder)
END PROGRAM Pipes
FUNCTION f(D,L,Q,hf,rho,mu,rough)
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP), PARAMETER::pi=3.14159265d0, g=9.81d0
REAL(DP), INTENT(IN)::L,Q,rough,rho,mu,hf,D
REAL(DP)::f, fric, reynold, coef
fric=(hf/((L/D)*(((4.0*Q)/(pi*D**2))/2*g)))
reynold=((rho*(4.0*Q/pi*D**2)*D)/mu)
coef=(rough/(3.7d0*D))
f=(1/SQRT(fric))+2.0d0*log10(coef+(2.51d0/(reynold*SQRT(fric))))
END FUNCTION
You very clearly declare the function in the interface (and the implementation) as
FUNCTION f(L,D,Q,hf,rho,mu,rough)
IMPLICIT NONE
INTEGER,PARAMETER::DP=selected_real_kind(15)
REAL(DP), PARAMETER::pi=3.14159265, g=9.81
REAL(DP), INTENT(IN)::L,Q,rough,rho,mu,hf,D
REAL(DP)::fx
END FUNCTION
So you need to pass 7 arguments to it. And none of them are optional.
But when you call it, you call it as
xnew=xold-fx(xold)*((xolder-xold)/(fx(xolder)-fx(xold))
supplying a single argument to it. When you try to compile it with gfortran for example, the compiler will complain for not getting any argument for D (the second dummy argument), because it stops with the first error.
It seems that the initial values for xold and xolder are too far from the solution. If we change them as
xold = 3.0d-5
xolder = 9.0d-5
and changing the threshold for convergence more tightly as
IF (ABS(fx(xnew,L,Q,hf,rho,mu,rough)) <= 1.0d-10) THEN
then we get
...
Diameter = 7.8306011049894322E-005
Diameter = 7.4533171406818087E-005
Diameter = 7.2580746283970710E-005
Diameter = 7.2653611474296094E-005
Diameter = 7.2652684750264582E-005
Diameter = 7.2652684291155581E-005
Here, we note that the function f(x) is defined as
FUNCTION f(D,L,Q,hf,rho,mu,rough)
...
f = (1/(hf/((L/D)*((4*Q)/pi*D)))) !! (1)
+ 2.0 * log( (rough/(3.7*D)) + (2.51/(((rho*((4*Q)/pi*D))/mu) !! (2)
* (hf/((L/D)*((4*Q)/pi*D))))) !! (3)
)
END FUNCTION
where terms in Lines (1) and (3) are both constant, while terms in Line (2) are some constants over D. So, we see that f(D) = c1 - 2.0 * log( D / c2 ), so we can obtain the solution analytically as D = c2 * exp(c1/2.0) = 7.26526809959e-5, which agrees well with the numerical solution above. To get a rough idea of where the solution is, it is useful to plot f(D) as a function of D, e.g. using Gnuplot.
But I am afraid that the expression for f(D) itself (given in the Fortran code) might include some typo due to many parentheses. To avoid such issues, it is always useful to first arrange the expression for f(D) as simplest as possible before making a program.
(One TIP is to extract constant factors outside and pre-calculate them.)
Also, for debugging purposes it is sometimes useful to check the consistency of physical dimensions and physical units of various terms. Indeed, if the magnitude of the obtained solution is too large or too small, there might be some problem of conversion factors for physical units, for example.

Can hidden characters change the name of output files in fortran?

I'm using a subroutine to output an array gathered from a previous function. After running my program and only calling the subroutine once, my intended file name, testf.txt, is given as "testf.txtÑ #." Thinking it was a hidden character generated by a portion I borrowed from a friend, I retyped the whole section but nothing changed. When I do multiple calls, the names of the the files add onto one another. Intended, "testf.txt" "testg.txt" "testh.txt", given, "testf.txttestg" "testg.txttesth" and "testh.txt". The content is fine, the name being the only issue. Would that be caused by hidden characters in the code, or a logic error in how the subroutine hold information?
Thanks,
Thomas
Main Program
program Main
implicit none
!DATA DICTIONARY
character :: selection ! Method of summation
! L = left Riemann sum
! R = right Riemann sum
! M = Midpoint rule
! T = Trapezoidal rule sum
! S = Simpson's rule
integer :: n ! Number of subintervals the integral is to
! be divided into
integer :: func ! The function to be integrated
! f = (28.0/5) + 2*sin(4*x) - (1/x)
! g = ln(4x) + (19x)^3 - 43
! h = 1/(sqrt(2.0*PI))*exp(-(x**2)/2.0)
character(20) :: filename
real, external :: f
real, external :: g
real, external :: h
real, dimension(1000) :: area
!Declare functions used
real :: Quad
write(*,*) "Written in the form ",
+ "Quad(func, n, left, right, selection):"
write(*,*) " "
!Test Drivers
write(*, *) Quad(f, 10, 0.2, 3.2, 'T', area)
call Out(10, area, 'testf.txt')
write(*, *) Quad(g, 10, 5.0, 10.0, 'L', area)
call Out(10, area, 'testg.txt')
write(*, *) Quad(h, 25, 1.0, 10.0, 'L', area)
call Out(10, area, 'testh.txt')
end program Main
Actual Out subroutine
subroutine Out(n, area, filename)
!PRE: n > 0
! area is the initialized array area(1...n)
! filename is initialized
implicit none
integer, intent(in) :: n
real, dimension(n), intent(in) :: area
character(20), intent(in) :: filename
integer :: i
open(unit=1, file=filename, form='formatted', !Writes area(1...n) to a file
+ action='write', status='replace')
do i = 1, n, 1
write(1,*) area(i)
end do
close(unit=1)
end subroutine Out
This sort of error is normally an indication of one or more of:
Array overruns - i.e. if your data is in an array of size 10 followed by your filen ame in memory and you write to array element 11 it will corrupt the file name.
Array under run - similar to the above but at the start of the array
Parameter corruption - if the text for the file name is passed as a read write parameter then it can be overwritten in the code.
You can often sort out this sort of issue by a) turn on all compiler error checking and fix all the warnings and/or b running the code through a lint equivalent such as ftnchek and fixing the issues it finds.

reading selected data from multiple files

I need some technical help in modifying my FORTRAN coding. I have searched the Internet but I can't fine one which can solve my need.
Basically I am analyzing simulation data using FORTRAN program. Firstly, I shall explain the format of my data to make easy the understanding of what I want. I have 10 files. Each file contains x, y z, data for 1000 frames and each frame contains 20736 (x,y,z) data. Since the total size of all data is about 10 GB for all 10,000 frames, I have to break them into small chunks (10 files) to avoid any crash during calculation. At the beginning of each file (first line) there is a text which can be neglected and each frame ending with information of the box size (bx,by,bz). This is the format of my data files.
I have attached the coding which I have been using for analysis.
The current codding will do calculation file after file and frame after frame in the sequential order. But now I want to do the calculation on selected frames only by jumping frame after frame with certain pattern. For example, I choose frame 1, 4, 8, 12, 16.... and so on until the last frame (10,000).
I have no idea how to choose the frames which are more then 1000 which fall in the second or third files.
I have shown my code below:
module all_parameter
integer,parameter :: MAXATOM=20736
integer,parameter :: nat = 20736
integer, parameter :: startFiles=31
integer, parameter :: endFiles=40
integer,parameter :: NO_OF_FILES=10
integer,parameter :: FRAMES_IN=1000
integer, parameter :: totalFrames = ( NO_OF_FILES * FRAMES_IN )
integer :: i, j, k, IFram, nhb, nlipid, jjj
integer :: BIN, iat, jat
!real :: DELR, fnid, GNRM, RCUT, rlower, rupper
real :: junk, dR, bx, by, bz, bbx
real :: only_head, only_tail, only_water
real :: mass_head, mass_tail, mass_water
character*4 at(MAXATOM)
real,dimension(MAXATOM) :: x, y, z
real,dimension(3) :: rcm
real,dimension(MAXATOM) :: rx, ry, rz
real,dimension(MAXATOM) :: mass
integer, parameter :: startlipid=1
integer, parameter :: endlipid=64
integer, parameter :: lipidNo=64
real, parameter :: PI = (22.0/7.0)
real, dimension(startlipid:endlipid) :: array_UniVekLx, array_UniVekLy, array_UniVekLz
integer :: no, no2, c71, c72, c80, c81
real :: p1x, p1y, p1z, p2x, p2y, p2z, vekx, veky, vekz
real :: mag_vekp1p2, unit_vekx, unit_veky, unit_vekz
real :: sum_UniVekLx, sum_UniVekLy, sum_UniVekLz
real :: avg_frame_vekx, avg_frame_veky, avg_frame_vekz
real :: xx, yy, zz, frame_MagLipVek, theta,theta2, uni_frame_Vekx, uni_frame_Veky, uni_frame_Vekz
real :: xxx, yyy, zzz,UniVekLx, UniVekLy, UniVekLz, FrameAvgUniVekMag
real :: avg_UniVekLx, avg_UniVekLy,avg_UniVekLz, MagLipVek
end module all_parameter
PROGRAM order_parameter
use all_parameter
implicit none
!=========================================================================
! Open files to be read and to write on
!=========================================================================
!
! Topology file !CHANGE
open(unit=31,status="old",file="../malto_thermoNEW_Ori_50ns-set1.traj ")
open(unit=32,status="old",file="../malto_thermoNEW_Ori_50ns-set2.traj ")
open(unit=33,status="old",file="../malto_thermoNEW_Ori_50ns-set3.traj ")
open(unit=34,status="old",file="../malto_thermoNEW_Ori_50ns-set4.traj ")
open(unit=35,status="old",file="../malto_thermoNEW_Ori_50ns-set5.traj ")
open(unit=36,status="old",file="../malto_thermoNEW_Ori_50ns-set6.traj ")
open(unit=37,status="old",file="../malto_thermoNEW_Ori_50ns-set7.traj ")
open(unit=38,status="old",file="../malto_thermoNEW_Ori_50ns-set8.traj ")
open(unit=39,status="old",file="../malto_thermoNEW_Ori_50ns-set9.traj ")
open(unit=40,status="old",file="../malto_thermoNEW_Ori_50ns-set10.traj ")
! Open New Files
open(unit=51,status="unknown",file="BOXinfo.dat ")
open(unit=75,status="unknown",file="magnitude_theta_lipid-thermo-malto.dat")
do k = startlipid, endlipid
array_UniVekLx(k) =0.0
array_UniVekLy(k) =0.0
array_UniVekLz(k) =0.0
end do
! READ COORDINATES IN FRAMES FROM TRAJ file
INPUTFILES: do jjj = startFiles, endFiles
! LOOP OVER FRAMES
IFram = 1
read(jjj,'(a)') junk
!&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
!IFram = 1
WHOLE: do while ( IFram <= FRAMES_IN)
read(jjj,'(10F8.3)') (x(i),y(i),z(i),i = 1,nat) ! reading TRAJ file
read(jjj,'(3F8.3)') bx,by,bz
write(51,'(a,3F8.3)') 'BOXINFO', bx,by,bz
! LOOP OVER ATOMS
loop1: do j = startlipid, endlipid !nat in lipids
loop2: do i = 45, 45 !,3 !atoms in a lipid
no= i + (j-1)*81
!no2= (no + 18)
c71=no
c72=(no+3)
p1x=((x(c71) + x(c72))/2.0 )
p1y=((y(c71) + y(c72))/2.0 )
p1z=((z(c71) + z(c72))/2.0 )
.
.
.
.
enddo loop2 ! going to next lipid
!CLOSE LOOP OVER ATOMS
enddo loop1 ! going to next frame , before that
!CLOSE LOOP OVER A FRAME
IFram = IFram + 1
enddo WHOLE
!CLOSE LOOP OVER ALL FILES
enddo INPUTFILES
end program order_parameter
I really appreciate your help in advance.
Thanks.
Well, in the loop, unless mod(framenumber, 4) == 0 skip to the next iteration (the CYCLE statement does that in Fortran). That way, you'll process only every fourth frame.
Also, you way want to use a somewhat more precise value for PI. 22/7, WTF?
Obviously the program needs to know which file it is reading, 1, 2... 10. Call that ifile. Then the frame with the file, say frame_in_file. Then you have frame = (ifile-1) * frame_in_file. Do you have a rule to decide whether you want to process "frame"? If the rule is to process every fourth, use mod and cycle as suggested #janneb. Otherwise, I'm not sure what you are asking.
With ten files, it would be easier to write the filename to a string and process them with a loop, opening each file in turn with the same unit number and closing at the end of the loop. This would be a little easier if you used the convention that the number in the file name was always two digits, with a leading zero if less than 10:
do ifile=1, 10
write (filename, '( "myfile_number", I2.2, ".data" )' ) ifile
open (unit=30, file=filename, ...)
loop: read from the file...
calculate overall frame number ...
cycle out of read loop if unsuitable frame...
process the frame...
end read loop
close (unit=30)
end do