I want to add elements to a 1d matrix mat, subject to a condition as in the test program below. In Fortran 2003 you can add an element
as mentioned in the related question Fortran array automatically growing when adding a value. Unfortunately, this is very slow for large matrices. So I tried to overcome this, by writing the matrix elements in an unformatted file and reading them afterwards. This turned out to be way faster than using mat=[mat,i]. For example for n=2000000_ilong the run time is 5.1078133666666661 minutes, whereas if you store the matrix elements in the file the run time drops to 3.5234166666666665E-003 minutes.
The problem is that for large matrix sizes the file storage.dat can be hundreds of GB...
Any ideas?
program test
implicit none
integer, parameter :: ndig=8
integer, parameter :: ilong=selected_int_kind(ndig)
integer (ilong), allocatable :: mat(:)
integer (ilong), parameter :: n=2000000_ilong
integer (ilong) :: i, cn
logical, parameter :: store=.false.
open(1, file='storage.dat',form='unformatted')
call cpu_time(START_CLOCK)
if(store) then
do i=1,n
call random_number(z)
if (z<0.5d0) then
write(1) i
end if
end do
rewind(1); allocate(mat(cn)); mat=0
do i=1,cn
read(1) mat(i)
end do
allocate(mat(1)); mat=0
do i=1,n
call random_number(z)
if (z<0.5d0) then
end if
end do
end if
call cpu_time(STOP_CLOCK)
print *, 'run took:', (STOP_CLOCK - START_CLOCK)/60.0d0, 'minutes.'
end program test
If the data file has hundreds of gigabytes, than there can may be no solution available at all, because you need so much RAM memory anyway for your array. Maybe you made the mistake of storing the data as text and then the memory size will be somewhat lower, but still tens of GB.
What is often done, when you need to add elements one-by-one and you do not know the final size, is growing the array geometrically in steps. That means pre-allocate an array to size N. When the array is full, you allocate a new array of size 2*N. When the array is full again, you allocate it to 4*N. And so on. Either you are finished or you exhausted all your memory.
Of course, it is often best to know the size of the array beforehand, but in some algorithms you simply do not have the information.
Maybe you need a dynamic container such as C++'s std::vector, with a push_back() function.
The following is a simplified version. You probably ought to check the allocation to make sure that you don't run out of addressable memory.
Note the need for random_seed.
module container
use iso_fortran_env
implicit none
type array
integer(int64), allocatable :: A(:)
integer(int64) num
procedure push_back
procedure print
end type array
interface array ! additional constructors
procedure array_constructor
end interface array
function array_constructor() result( this ) ! performs initial allocation
type(array) this
allocate( this%A(1) )
this%num = 0
end function array_constructor
subroutine push_back( this, i )
class(array), intent(inout) :: this
integer(int64) i
integer(int64), allocatable :: temp(:)
if ( size(this%A) == this%num ) then ! Need to resize
allocate( temp( 2 * this%num ) ) ! <==== for example
temp(1:this%num ) = this%A
call move_alloc( temp, this%A )
! print *, "Resized to ", size( this%A ) ! debugging only!!!
end if
this%num = this%num + 1
this%A(this%num) = i
end subroutine push_back
subroutine print( this )
class(array), intent(in) :: this
write( *, "( *( i0, 1x ) )" ) ( this%A(1:this%num) )
end subroutine print
end module container
program test
use iso_fortran_env
use container
implicit none
type(array) mat
integer(int64) :: n = 2000000_int64
integer(int64) i
real(real64) z, START_CLOCK, STOP_CLOCK
mat = array() ! initial trivial allocation
call random_seed ! you probably need this
call cpu_time(START_CLOCK)
do i = 1, n
call random_number( z )
if ( z < 0.5_real64 ) call mat%push_back( i )
end do
call cpu_time(STOP_CLOCK)
print *, 'Run took ', ( STOP_CLOCK - START_CLOCK ) / 60.0_real64, ' minutes.'
! call mat%print ! debugging only!!!
end program test
This is more of a best practice on Fortran code writing other than solving an error.
I have this following code sample with some large array that needs to be passed around to some subroutine for some calculation
program name
implicit none
integer, parameter:: n = 10**8
complex(kind=8) :: x(n)
integer :: i, nVal
nVal = 30
do i =1,1000
call test(x,nVal)
!-----other calculations-----!
! after every step nVal chnages, and after few step nVal converges
! e.g. `nVal` starts from 30 and converges at 14, after 10-15 steps, and stays there for rest of the loop
! once `nVal` converges the `workarray` requires much less memory than it requires at the starts
subroutine test(arr,m)
integer , intent(inout) :: m
complex(kind=8), intent(inout) :: arr(n)
complex(kind=8) :: workarray(n,m) ! <-- large workspace
!----- do calculation-----------!
!--- check convergence of `m`----!
end program name
The internal workarray depends on a value that decreases gradually and reaches a convergence, and stays there for rest of the code. If I check the memory usage with top it shows at 27% from starts to finish. But after few steps the memory requirement should decrease too.
So, I modified the code to use allocatable workarray like this,
program name
implicit none
integer, parameter:: n = 10**8
complex(kind=8) :: x(n)
integer :: i, nVal, oldVal
complex(kind=8), allocatable :: workarray(:,:)
nVal = 30
oldVal = nVal
do i =1,1000
! all calculation of the subroutine `test` brought to this main code
!--- check convergence of `nVal`----!
if(nVal /= oldVal) then
oldVal = nVal
end program name
Now, If I use top the memory usage starts at about 28% and then decreases and reaches a converged value of 19%.
Now, my question is how should I code situations like this. The allocatable option do decreases memory requirement but it also hampers the code readability a little bit and introduces code duplication in several places. On the other hand, the prior option keeps larger memory for the whole time where much less memory would suffice. So, what is preferred way of coding in this situation?
I can't help you decide which of the two methods is better; it will depend on how you (or the users of your code) value the potential tradeoff between memory use and cpu use. However, I can suggest a better version of your second method.
Rather than passing workarray in and out of test, you can keep it local to test and use the save attribute to make it persistent between procedure calls.
This would look something like
program name
implicit none
integer, parameter :: dp = selected_real_kind(15,300)
integer, parameter:: n = 10**8
complex(dp) :: x(n)
integer :: i, nVal
nVal = 30
do i =1,1000
call test(x,nVal)
subroutine test(arr,m)
complex(dp), intent(inout) :: arr(:)
integer, intent(inout) :: m
! Initialise workarray to an empty array
! Avoids having to check if it is allocated each time
complex(dp), allocatable, save :: workarray(:,:) = reshape([complex(dp)::], [0, 0])
! Reallocate workarray if necessary.
if (size(workarray, 2)<m) then
allocate(workarray(size(arr), m))
end subroutine
end program
If m is likely to increase slowly, you may also want to consider replacing allocate(workarray(size(arr), m)) with allocate(workarray(size(arr), 2*m)), such that you get c++ std::vector-style memory management.
The main downside of this approach (besides not reducing the memory use) is that you need to be more careful if you want to run parallel code which uses procedures with saved variables.
There are several threads with similar titles of mine, but I do not believe they are the same. One was very similar fortran pass allocated array to main procedure, but the answer required Fortran 2008. I am after a Fortran 90/95 solution.
Another very good, and quite similar thread is Dynamic array allocation in fortran90. However in this method while they allocate in the subroutine, they don't ever appear to deallocate, which seems odd. My method looks on the surface at least to be the same, yet when I print the array in the main program, only blank spaces are printed. When I print in the subroutine itself, the array prints to screen the correct values, and the correct number of values.
In the following a MAIN program calls a subroutine. This subroutine reads data into an allocatable array, and passes the array back to the main program. I do this by using small subroutines each designed to look for specific terms in the input file. All of these subroutines are in one module file. So there are three files: Main.f90, input_read.f90 and filename.inp.
It seems then that I do not know how to pass an array that is allocatable in program Main.f90 as well as in the called subroutine where it is actually allocated, sized, and then deallocated before being passed to program Main. This perhaps sounds confusing, so here is the code for all three programs. I apologize for the poor formatting when I pasted it. I tried to separate all the rows.
Program main
use input_read ! the module with the subroutines used for reading filename.inp
implicit none
REAL, Allocatable :: epsilstar(:)
INTEGER :: natoms
call Obtain_LJ_Epsilon(epsilstar, natoms)
print*, 'LJ Epsilon : ', epsilstar
END Program main
Next is the module with a subroutine (I removed all but the necessary one for space), input_read.f90:
module input_read
Subroutine Obtain_LJ_Epsilon(epsilstar,natoms)
! Reads epsilon and sigma parameters for Lennard-Jones Force-Field and also
! counts the number of types of atoms in the system
INTEGER :: error,line_number,natoms_eps,i
CHARACTER(120) :: string, next_line, next_next_line,dummy_char
CHARACTER(8) :: dummy_na,dummy_eps
INTEGER,intent(out) :: natoms
LOGICAL :: Proceed
real, intent(out), allocatable :: epsilstar(:)
error = 0
line_number = 0
Proceed = .true.
! Find key word LJ_Epsilon
line_number = line_number + 1
Read(10,'(A120)',iostat=error) string
IF (error .NE. 0) THEN
print*, "Error, stopping read input due to an error reading line"
IF (string(1:12) == '$ LJ_epsilon') THEN
line_number = line_number + 1
ELSE IF (string(1:3) == 'END' .or. line_number > 2000) THEN
print*, "Hit end of file before reading '$ LJ_epsilon' "
Proceed = .false.
! Key word found, now determine number of parameters
! needing to be read
natoms_eps = -1
dummy_eps = 'iii'
do while ((dummy_eps(1:1) .ne. '$') .and. (dummy_eps(1:1) .ne. ' '))
natoms_eps = natoms_eps + 1
read(10,*) dummy_eps
enddo !we now know the number of atoms in the system (# of parameters)
epsilstar = 0.0
! Number of parameters found, now read their values
if(Proceed) then
do i = 1,line_number-1
read(11,*) ! note it is not recording anything for this do loop
do i = 1,natoms_eps
read(11,*) dummy_char
read(dummy_char,*) epsilstar(i) ! convert string read in to real, and store in epsilstar
PRINT*, 'LJ_epsilon: ', epsilstar ! printing to make sure it worked
END Subroutine Obtain_LJ_Epsilon
end module input_read
And finally the input file: filename.inp
# Run_Type
# Run_Name
# Pressure
# Temperature
# Number_Species
# LJ_epsilon
# LJ_sigma
And again, I can't figure out how to pass the allocated epsilstar array to the main program. I have tried passing an unallocated array to the subroutine from the main.f90, allocating it inside, passing it back, and deallocating it in the main.f90, but that did not work. I have tried it as the code currently is... the code works (i.e. is bug free) but it does not pass epsilstar from the subroutine where it correctly finds it and creates an array.
It turns out that the mistake I made was in deallocating the array in the subroutine before passing it to the main program. By NOT deallocating, the array was sent back fine. Also, I do not deallocate in the main program either.
I am writing a generic subroutine in fortran90 that will read in a column of data (real values). The subroutine should first check to see that the file exists and can be opened, then it determines the number of elements (Array_Size) in the column by reading the number of lines until end of file. Next the subroutine rewinds the file back to the beginning and reads in the data points and assigns each to an array (Column1(n)) and also determines the largest element in the array (Max_Value). The hope is that this subroutine can be written to be completely generic and not require any prior knowledge of the number of data points in the file, which is why the number of elements is first determined so the array, "Column1", can be dynamically allocated to contain "Array_Size" number of data points. Once the array is passed to the main program, it is transferred to another array and the initial dynamically allocated array is deallocated so that the routine can be repeated for multiple other input files, although this example only reads in one data file.
As written below, the program compiles just fine on the Intel fortran compiler; however, when it runs it gives me a severe (174): SIGSEV fault. I place the write(,) statements before and after the allocate statement in the subroutine and it prints the first statement "Program works here", but not the second, which indicates that the problem is occurring at the ALLOCATE (Column1(Array_Size)) statement, between the two write(,) statements. I re-compiled it with -C flag and ran the executable, which fails again and states severe (408): "Attempt to fetch from allocatable variable MISC_ARRAY when it is not allocated". The variable MISC_ARRAY is the dummy variable in the main program, which seems to indicate that the compiler wants the array allocated in the main program and not in the subprogram. If I statically allocate the array, the program works just fine. In order to make the program generic and not require any knowledge of the size of each file, it needs to be dynamically allocated and this should happen in the subprogram, not the main program. Is there a way to accomplish this that I am not seeing?
! - variable Definitions for MAIN program
! - Variable Definitions for EXPENSE READER Subprograms
INTEGER :: Size_Misc
REAL :: Peak_Misc_Value
! REAL :: Misc_Array(365)
CHARACTER(LEN=13) :: File_Name
File_Name = "Misc.txt"
CALL One_Column(File_Name,Size_Misc,Peak_Misc_Value,Misc_Array)
DO n = 1,Size_Misc ! Transfers array data
MISC_DATA(n) = Misc_Array(n)
SUBROUTINE One_Column(File_Name,Array_Size,Max_Value,Column1)
! REAL :: Column1(365)
REAL, INTENT(OUT) :: Max_Value
INTEGER :: Open_Status,Input_Status,n
! Open the file and check to ensure it is properly opened
OPEN(UNIT=100,FILE = File_Name,STATUS = 'old',ACTION = 'READ', &
IOSTAT = Open_Status)
IF(Open_Status > 0) THEN
WRITE(*,'(A,A)') "**** Cannot Open ",File_Name
! Determine the size of the file
Array_Size = 0
DO 300
READ(100,*,IOSTAT = Input_Status)
IF(Input_Status < 0) EXIT
Array_Size = Array_Size + 1
WRITE(*,*) "Program works here"
ALLOCATE (Column1(Array_Size))
WRITE(*,*) "Program stops working here"
Max_Value = 0.0
DO n = 1,Array_Size
READ(100,*) Column1(n)
IF(Column1(n) .GT. Max_Value) Max_Value = Column1(n)
This is an educated guess: I think that the subroutine One_Column ought to have an explicit interface. As written the source code has 2 compilation units, a program (called main) and an external subroutine (called One_Column).
At compile-time the compiler can't figure out the correct way to call the subroutine from the program. In good-old (emphasis on old) Fortran style it takes a leap of faith and leaves it to the linker to find a subroutine with the right name and crosses its fingers (as it were) and hopes that the actual arguments match the dummy arguments at run-time. This approach won't work on subroutines returning allocated data structures.
For a simple fix move end program to the end of the source file, in the line vacated enter the keyword contains. The compiler will then take care of creating the necessary interface.
For a more scalable fix, put the subroutine into a module and use-associate it.
I think it is important to show the corrected code so that future users can read the question and also see the solution. I broke the subroutine into a series of smaller functions and one subroutine to keep the data as local as possible and implemented it into a module. The main program and module are attached. The main program includes a call to the functions twice, just to show that it can be used modularly to open multiple files.
! - Author: Jonathan A. Webb
! - Date: December 11, 2014
! - Purpose: This code calls subprograms in module READ_COLUMNAR_FILE
! to determine the number of elements in an input file, the
! largest element in the input file and reads in the column of
! data as an allocatable array
!********************* **********************
!********************* VARIABLE DEFINITIONS **********************
!********************* **********************
CHARACTER(LEN=13) :: File_Name
INTEGER :: Size_Misc,Size_Bar,Unit_Number
REAL :: Peak_Misc_Value,Peak_Bar_Value
!********************* **********************
!********************* FILE READER BLOCK **********************
!********************* **********************
! - This section reads in data from all of the columnar input decks.
! User defines the input file name and number
File_Name = "Misc.txt"; Unit_Number = 100
! Determines the number of rows in the file
Size_Misc = File_Length(File_Name,Unit_Number)
! Yields the allocatable array and the largest element in the array
CALL Read_File(File_Name,Unit_Number,Misc_Array,Peak_Misc_Value)
File_Name = "Bar.txt"; Unit_Number = 100
Size_Bar = File_Length(File_Name,Unit_Number)
CALL Read_File(File_Name,Unit_Number,Bar_Array,Peak_Bar_Value)
! ***
! Author: Jonathan A. Webb ***
! Purpose: Compilation of subprograms required to read in multi-column ***
! data files ***
! Drafted: December 11, 2014 ***
! ***
! Public functions and subroutines for this module
PUBLIC :: Read_File
PUBLIC :: File_Length
! Private functions and subroutines for this module
PRIVATE :: Check_File
SUBROUTINE Check_File(Unit_Number,Open_Status,File_Name)
! Check to see if the file exists
OPEN(UNIT=Unit_Number,FILE = File_Name,STATUS='old',ACTION='read', &
IOSTAT = Open_Status)
IF(Open_Status .GT. 0) THEN
WRITE(*,*) "**** Cannot Open ", File_Name," ****"
FUNCTION File_Length(File_Name,Unit_Number)
INTEGER :: File_Length
INTEGER, INTENT(IN) :: Unit_Number
INTEGER :: Open_Status,Input_Status
! Calls subroutine to check on status of file
CALL Check_File(Unit_Number,Open_Status,File_Name)
IF(Open_Status .GT. 0)THEN
WRITE(*,*) "**** Cannot Read", File_Name," ****"
! Determine File Size
File_Length = 0
DO 300
READ(Unit_Number,*,IOSTAT = Input_Status)
IF(Input_Status .LT. 0) EXIT
File_Length = File_Length + 1
SUBROUTINE Read_File(File_Name,Unit_Number,Column1,Max_Value)
INTEGER, INTENT(IN) :: Unit_Number
REAL, INTENT(OUT) :: Max_Value
INTEGER :: Array_Size,n
! Determines the array size and allocates the array
Array_Size = File_Length(File_Name,Unit_Number)
ALLOCATE (Column1(Array_Size))
! - Reads in columnar array and determines the element with
! the largest value
Max_Value = 0.0
OPEN(UNIT= Unit_Number,File = File_Name)
DO n = 1,Array_Size
READ(Unit_Number,*) Column1(n)
IF(Column1(n) .GT. Max_Value) Max_Value = Column1(n)
I need to find how to use the dimension attribute in this program. The problem in here that I can't figure out is how user can specify the number of rows? (another word, the number of students):
implicit none
write(*,*)'how many student are in the classroom?'
write(*,*)k,'.','student name=';read(*,*)B(k)
write(*,*)'Final Quiz';read(*,*)A(k,3)
write(10,9)B(k),' ',A(k,1),' ',A(k,2),' ',A(k,3),' ',A(k,4)
end do
9 format(1x,A10,A5,F5.1,A3,F5.1,A3,F5.1,A3,F5.1)
end program
Well basically you have fixed (static) arrays which are defined e.g. using dimension:
real,dimension(4) :: X
X is an array of length 4 (1-4). This is equivalent to:
real :: X(4)
Static arrays have a fixed length throughout their scope (e.g. throughout the program for global variables or throughout functions/subroutines).
What you need are allocatable arrays which are allocated at runtime:
program test
implicit none
real, allocatable :: B(:) ! The shape is given by ":" - 1 dimension
integer :: stat
! allocate memory, four elements:
allocate( B(4), stat=stat )
! *Always* check the return value
if ( stat /= 0 ) stop 'Cannot allocate memory'
! ... Do stuff
! Clean up
deallocate( B )
! Allocate again using a different length:
allocate( B(3), stat=stat )
! *Always* check the return value
if ( stat /= 0 ) stop 'Cannot allocate memory'
! No need to deallocate at the end of the program!
end program
real,dimension(:,:),allocatable ::A
This works! Thank you guys.
I am trying to write a wrapper for 'allocate' function, i.e. function which receives an array and dimensions, allocates memory and returns allocated array. The most important thing is that the function must work with arrays of different rank. But I have to explicitly state rank of array in function interface, and in this case code only compiles if I pass arrays of certain rank as a parameter. For example, this code does not compile:
module memory_allocator
subroutine memory(array, length)
implicit none
real(8), allocatable, intent(out), dimension(:) :: array
integer, intent(in) :: length
integer :: ierr
print *, "memory: before: ", allocated(array)
allocate(array(length), stat=ierr)
if (ierr /= 0) then
print *, "error allocating memory: ierr=", ierr
end if
print *, "memory: after: ", allocated(array)
end subroutine memory
subroutine freem(array)
implicit none
real(8), allocatable, dimension(:) :: array
print *, "freem: before: ", allocated(array)
print *, "freem: after: ", allocated(array)
end subroutine freem
end module memory_allocator
program alloc
use memory_allocator
implicit none
integer, parameter :: n = 3
real(8), allocatable, dimension(:,:,:) :: foo
integer :: i, j, k
print *, "main: before memory: ", allocated(foo)
call memory(foo, n*n*n)
print *, "main: after memory: ", allocated(foo)
do i = 1,n
do j = 1,n
do k = 1, n
foo(i, j, k) = real(i*j*k)
end do
end do
end do
print *, foo
print *, "main: before freem: ", allocated(foo)
call freem(foo)
print *, "main: after freem: ", allocated(foo)
end program alloc
Compilation error:
gfortran -o alloc alloc.f90 -std=f2003
call memory(foo, n*n*n)
Error: Rank mismatch in argument 'array' at (1) (1 and 3)
call freem(foo)
Error: Rank mismatch in argument 'array' at (1) (1 and 3)
Is there any way of implementing such wrapper?..
This can be done via a generic interface block. You have to create procedures for each rank that you want to handle, e.g., memory_1d, memory_2d, ... memory_4d. (Obviously a lot of cut & pasting.) Then you write a generic interface block that gives all of these procedures the alternative name memory as a generic procedure name. When you call memory, the compiler distinguishes which memory_Xd should be called based on the rank of the argument. The same for your freem functions.
This is how intrinsic functions such as sin have long worked -- you can call sin with a real arguments of various previsions, or with a complex argument, and the compiler figures out with actual sin function to call. In really old FORTRAN you had to use different names for the different sin functions. Now modern Fortran you can setup the same thing with your own routines.
Edit: adding a code example demonstrating the method & syntax:
module double_array_mod
implicit none
interface double_array
module procedure double_vector
module procedure double_array_2D
end interface double_array
private ! hides items not listed on public statement
public :: double_array
subroutine double_vector (vector)
integer, dimension (:), intent (inout) :: vector
vector = 2 * vector
end subroutine double_vector
subroutine double_array_2D (array)
integer, dimension (:,:), intent (inout) :: array
array = 2 * array
end subroutine double_array_2D
end module double_array_mod
program demo_user_generic
use double_array_mod
implicit none
integer, dimension (2) :: A = [1, 2]
integer, dimension (2,2) :: B = reshape ( [11, 12, 13, 14], [2,2] )
integer :: i
write (*, '( / "vector before:", / 2(2X, I3) )' ) A
call double_array (A)
write (*, '( / "vector after:", / 2(2X, I3) )' ) A
write (*, '( / "2D array before:" )' )
do i=1, 2
write (*, '( 2(2X, I3) )' ) B (i, :)
end do
call double_array (B)
write (*, '( / "2D array after:" )' )
do i=1, 2
write (*, '( 2(2X, I3) )' ) B (i, :)
end do
end program demo_user_generic
subroutine memory(array, length) has as it first dummy parameter 1-dimensional array (real(8), allocatable, intent(out), dimension(:) :: array).
Calling this subroutine from your main program with 3-dimensional array foo (real(8), allocatable, dimension(:,:,:) :: foo) is error obviously. And this is what compiler actually said.
If you really need such subroutines write one pair memory/freem subroutines for each array of different dimension - one subroutines pair for 1-dimensional array, another for 2-dimensional array, etc.
By the way, memory subroutines will be different in general because in order to allocate n-dimensional array you need to pass n extents to above-mentioned subroutine.