Writing large Fortran binary files with access=stream

Writing large Fortran binary files with access=stream - fortran

I am having some problems understanding the formatting of binary files that I am writing using Fortran. I use the following subroutine to write binary files to disk:
SUBROUTINE write_field(d,m,outfile)
IMPLICIT NONE
REAL, INTENT(IN) :: d(m,m,m)
INTEGER, INTENT(IN) :: m
CHARACTER(len=256), INTENT(IN) :: outfile
OPEN(7,file=outfile,form='unformatted',access='stream')
WRITE(7) d
CLOSE(7)
END SUBROUTINE write_field
My understanding of the access=stream option was that this would suppress the standard header and footer that comes with a Fortran binary (see Fortran unformatted file format).
If I write a file with m=512 then my expectation is that the file should be 4 x 512^3 bytes = 536870912 bytes ~ 513 Mb however they are in fact 8 bytes longer than this, coming in at 536870920 bytes. My guess is that these extra bytes are the 4 byte header and footers, which I had wanted to suppress by using access='stream'.
The situation becomes confusing to me if I write a file with m=1024 then my expectation is that the file should be 4 x 1024^3 bytes = 4294967296 ~ 4.1 Gb however they are in fact 24(!) bytes longer than this, coming in at 4294967320 bytes. I do not understand why there are 24 extra bytes here, which would seem to correspond to 6(!) headers or footers.
My questions are:
(a) Is it possible to get Fortran to write a binary with no headers or footers?
(b) If the answer to (a) is 'no' then can I ensure that the larger binary has the same header and footer structure as the smaller binary?
(c) If the answers to (a) and (b) are both 'no' then how do I understand where these extra headers and footers are in the file.
I am using ifort (version 14.0.2) and I am writing the binary files on a small Linux cluster.
UPDATE: When running the same code with OSx and compiled with gfortran 7.3.0 the binary files come out with the expected sizes, as in they are always 4 x m^3 bytes, even when m=1024. So this problem seems to be related to the older compiler.
UPDATE: In fact, the problem is only present when using ifort 14.0.2 I have updated the text to reflect this.

This problem is solved by adding status='replace' in the Fortran open command. It is not to do with the compiler.
With access='stream' and without status='replace', the old binary file is not automatically replaced by the new binary file and is simply overwritten up to a certain point (https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/676047). This results in the old binary simply having bytes replaced up to the size of the new binary, while leaving any additional bytes, and the file size, unchanged. This is a problem if the new file size is smaller than the old file size. The problem difficult to diagnose because the time-stamp on the file is updated, so the file looks like it is new when queried using ls -l.
A minimal working example that recreates this problem is as follows:
PROGRAM write_binary_test_minimal
IMPLICIT NONE
REAL :: a
a=1.
OPEN(7,file='test',form='unformatted')
WRITE(7) a
CLOSE(7)
OPEN(7,file='test',form='unformatted',access='stream')
WRITE(7) a
CLOSE(7)
END PROGRAM write_binary_test_minimal
The first write generates a file 'test' of size 8 + 4 = 12 bytes. Where the 8 is the standard Fortran-binary header and footer and the 4 is the size in bytes of a. In the second write statement, even though access='stream' has been set, only the first 4 bytes of the previously-generated 'test' are overwritten, leaving the file as size 12 bytes! The solution to this is to change the second write statement to
OPEN(7,file='test',form='unformatted',access='stream',status='replace')
with an explicit status='replace' to ensure the old file is replaced.

Related

"Insufficient virtual memory" error for allocating small arrays

I have been using Fortran for a few months now, but I am self-taught and have only been learning it by reading someone else's codes so my knowledge of Fortran is very limited. I wrote this function which is meant to read a text file containing data and save these data in an array. Since I don't know the size of the data, I choose to allocate the array within the function.
FUNCTION RSEBIN(NAMEIN,NZNSEB)
IMPLICIT DOUBLE PRECISION (A-H, O-Z)
INTEGER DSEBTP, IIND, NZNSEB
CHARACTER(LEN=75) :: FILNAM
CHARACTER NAMEIN*(*)
REAL, ALLOCATABLE :: RSEBIN(:,:)
WRITE (FILNAM,1500) 'Extra_InputFiles/SEB_inputs/SEB_', NAMEIN,
2 '.txt' !Define the path and name of the input data text file
1500 FORMAT (A32,A,A4)
OPEN (UNIT=101, FILE=FILNAM, STATUS='OLD')
READ(101,*) !Skip the header
DSEBTP = 0
DO
READ(101,*,IOSTAT=IO) TRASH
IF (IO.NE.0) EXIT !Exit the loop when last line has been reached
DSEBTP = DSEBTP + 1 !Counts how many time periods inputs are set for the input data type
END DO
REWIND(101) !Rewind text file to read the inputs
ALLOCATE(RSEBIN(DSEBTP,NZNSEB+1)) !Allocate the input data array
READ(101,*) !Skip the header
DO 1510 ISEBTP=1,DSEBTP
READ(101,*) (RSEBIN(ISEBTP,IIND), IIND=1, NZNSEB+1) !Save the data in the main array
1510 CONTINUE
CLOSE (UNIT=101)
RETURN
END FUNCTION
I then use this function in another subroutine with this following statement:
ASEBAT = RSEBIN('AirTemperature',NZNSEB) !Allocate the air temperature array (first column is time)
When I try to run the program, I get a "Insufficient virtual memory" error. After a quick search, I discovered that this error usually occurs when one is allocating huge arrays. However, during my tests, I was only using a 3 X 5 array. After a few more tests, I realized that the function works fine if I declare the dimensions of my array RSEBIN rather than making it allocatable and allocating it in the function. However, this solution is not sustainable for me as I want this function to be able to read text files of various dimensions.
Does anyone have an idea why I have such error? Should I avoid allocating arrays in a function? As I said previously, I am fairly new to Fortran and I am pretty sure my code has many issues, so I apologize for my primitive code writing and would be grateful for any tip.
Also, I should note that I'm using the Intel Fortran Compiler from oneAPI for Windows. I recently switched from the fortran compiler in Intel XE, with which, if I can recall, I was using a similar function without any issue.
Thanks!

Reading many files in Fortran subroutine

I have a Fortran subroutine that is called several times in a main program (which I don't have access to). In my subroutine, I wish to read data from one of several (~10^4) files in every iteration based on an input argument. Each of the files has one line of data; and the format of my data is as follows:
0.97014199999999995 0.24253600000000000 0.0000000000000000
I'm using the following lines of code to open and read the files:
program test_read
implicit none
integer :: i, iopen_status, iread_status
real :: gb
CHARACTER(len=25) :: filename
CHARACTER(*), PARAMETER :: fileplace =
& "/home/ajax/hexmesh_readn/G3/"
dimension gb(3)
i = 5
WRITE(filename,'(a,I0,a)')'GBn_',i,'.txt'
open(unit=15,
& file=fileplace//filename,IOSTAT=iopen_status)
read (15,*,IOSTAT=iread_status) gb
print *,"gb",gb(1),gb(2),gb(3)
close(15)
end program test_read
In the main program, i is a variable, but I have a file for all possible values of i.
Now, this code works perfectly well when I run on my local machine. But, when I submit it along with the main program, it behaves somewhat weirdly. Specifically, it reads some of the files, but not others.
When I print out the IOSTAT for open and read, I see that the IOSTAT for open is 0 for all the files, whereas that for the read command is 0 for some, -1 for some and 29 for others! I looked up what the error code 29 means and I learned that it might indicate that the file is not found in the path. But the file is most definitely there.
Also, I don't see anything different about the files that it isn't able to read. In fact, I have even seen the same file giving an IOSTAT value of 0 and 29!
One thing to note is that I'm running the main program on several cores. Could this have anything to do with the error?

Are you running several instances of the same program simultaneously? On some operating systems, different programs can't simultaneously open the same file. Specifying that you want read-only access might allow access by multiple programs. On the Fortran open statement: action='read'.
If you are running a multi-threaded program, then different threads might be doing IO simultaneously on different files ... different unit numbers should be used by each thread to avoid conflicts.

sizeof in fortran

It's quite common in C-code to see stuff like:
malloc(sizeof(int)*100);
which will return a pointer to a block of memory big enough to hold 100 ints. Is there any equivalent in fortran?
Use case:
I have a binary file which is opened as:
open(unit=10,file='foo.dat',access='stream',form='unformatted',status='old')
I know that the file contains "records" which consist of a header with 20 integers, 20 real numbers and 80 characters, then another N real numbers. Each file can have hundreds of records. Basically, I'd like to read or write to a particular record in this file (assuming N is a fixed constant for simplicity).
I can easily calculate the position in the file I want to write if I know the size of each data-type:
header_size = SIZEOF_INT*20 + SIZEOF_FLOAT*20 + SIZEOF_CHAR*80
data_size = N*SIZEOF_FLOAT
position = (record_num-1)*(header_size+data_size)+1
Currently I have
!Hardcoded :-(
SIZEOF_INT = 4
SIZEOF_FLOAT = 4
SIZEOF_DOUBLE = 8
SIZEOF_CHAR = 1
Is there any way to do better?
constraints:
The code is meant to be run on a variety of platforms with a variety of compilers. A standard compliant solution is definitely preferred.

In your use case I think you could use
inquire(iolength=...) io-list
That will give you how many "file storage units" are required for the io-list. A caveat with calculating offsets in files with Fortran is that "file storage unit" need not be in bytes, and indeed I recall one quite popular compiler by default using a word (4 bytes) as the file storage unit. However, by using the iolength thing you don't need to worry about this issue.

#janneb's answer will address the OP's question, but it doesn't answer the "sizeof" question for Fortran.
A combination of inquire and file_storage_size will give the size of a type. Try this code:
program sizeof
use iso_fortran_env
integer :: num_file_storage_units
integer :: num_bytes
inquire(iolength=num_file_storage_units) 1.0D0
num_bytes = num_file_storage_units*FILE_STORAGE_SIZE/8
write(*,*) "double has size: ", num_bytes
end program sizeof
See:
http://gcc.gnu.org/onlinedocs/gfortran/ISO_005fFORTRAN_005fENV.html
http://h21007.www2.hp.com/portal/download/files/unprot/fortran/docs/lrm/lrm0514.htm

If all the records are the same, this would seem to be a case to use direct access rather than stream access. Then don't calculate the position in the file, you tell the compiler the record that you want, and it accesses it. Unless you want these files to be portable across platforms or the records are not all the same ... then you have to have more control or calculate the length of the records. While the original Fortran 90 concept was to declare variables according to the required precision, there are now portable ways to declare variables by size. Either with types provided by the already mentioned iso_c_binding module, or from the iso_fortran_env module.

Unexpected "padding" in a Fortran unformatted file

I don't understand the format of unformatted files in Fortran.
For example:
open (3,file=filename,form="unformatted",access="sequential")
write(3) matrix(i,:)
outputs a column of a matrix into a file. I've discovered that it pads the file with 4 bytes on either end, however I don't really understand why, or how to control this behavior. Is there a way to remove the padding?

For unformated IO, Fortran compilers typically write the length of the record at the beginning and end of the record. Most but not all compilers use four bytes. This aids in reading records, e.g., length at the end assists with a backspace operation. You can suppress this with the new Stream IO mode of Fortran 2003, which was added for compatibility with other languages. Use access='stream' in your open statement.

I never used sequential access with unformatted output for this exact reason. However it depends on the application and sometimes it is convenient to have a record length indicator (especially for unstructured data). As suggested by steabert in Looking at binary output from fortran on gnuplot, you can avoid this by using keyword argument ACCESS = 'DIRECT', in which case you need to specify record length. This method is convenient for efficient storage of large multi-dimensional structured data (constant record length). Following example writes an unformatted file whose size equals the size of the array:
REAL(KIND=4),DIMENSION(10) :: a = 3.141
INTEGER :: reclen
INQUIRE(iolength=reclen)a
OPEN(UNIT=10,FILE='direct.out',FORM='UNFORMATTED',&
ACCESS='DIRECT',RECL=reclen)
WRITE(UNIT=10,REC=1)a
CLOSE(UNIT=10)
END
Note that this is not the ideal aproach in sense of portability. In an unformatted file written with direct access, there is no information about the size of each element. A readme text file that describes the data size does the job fine for me, and I prefer this method instead of padding in sequential mode.

Fortran IO is record based, not stream based. Every time you write something through write() you are not only writing the data, but also beginning and end markers for that record. Both record markers are the size of that record. This is the reason why writing a bunch of reals in a single write (one record: one begin marker, the bunch of reals, one end marker) has a different size with respect to writing each real in a separate write (multiple records, each of one begin marker, one real, and one end marker). This is extremely important if you are writing down large matrices, as you could balloon the occupation if improperly written.

Fortran Unformatted IO I am quite familiar with differing outputs using the Intel and Gnu compilers. Fortunately my vast experience dating back to 1970's IBM's allowed me to decode things. Gnu pads records with 4 byte integer counters giving the record length. Intel uses a 1 byte counter and a number of embedded coding values to signify a continuation record or the end of a count. One can still have very long record lengths even though only 1 byte is used.
I have software compiled by the Gnu compiler that I had to modify so it could read an unformatted file generated by either compiler, so it has to detect which format it finds. Reading an unformatted file generated by the Intel compiler (which follows the "old' IBM days) takes "forever" using Gnu's fgetc or opening the file in stream mode. Converting the file to what Gnu expects results in a factor of up to 100 times faster. It depends on your file size if you want to bother with detection and conversion or not. I reduced my program startup time (which opens a large unformatted file) from 5 minutes down to 10 seconds. I had to add in options to reconvert back again if the user wants to take the file back to an Intel compiled program. It's all a pain, but there you go.

Importing data from file to array

I have 2 dimensional table in file, which look like this:
11, 12, 13, 14, 15
21, 22, 23, 24, 25
I want it to be imported in 2 dimensional array. I wrote this code:
INTEGER :: SMALL(10)
DO I = 1, 3
READ(UNIT=10, FMT='(5I4)') SMALL
WRITE(UNIT=*, FMT='(6X,5I4)') SMALL
ENDDO
But it imports everything in one dimensional array.
EDIT:
I've updated code:
program filet
integer :: reason
integer, dimension(2,5) :: small
open(10, file='boundary.inp', access='sequential', status='old', FORM='FORMATTED')
rewind(10)
DO
READ(UNIT=10, FMT='(5I4)', iostat=reason) SMALL
if (reason /= 0) exit
WRITE(UNIT=*, FMT='(6X,5I4)') SMALL
ENDDO
write (*,*) small(2,1)
end program
Here is output:
11 12 13 14 15
21 22 23 24 25
12

Well, you have defined SMALL to be a 1-D array, and Fortran is just trying to be helpful. You should perhaps have defined SMALL like this;
integer, dimension(2,5) :: small
What happened when the read statement was executed was that the system ran out of edit descriptor (you specified 5 integers) before either SMALL was full or the end of the file was encountered. If I remember rightly Fortran will re-use the edit descriptor until either SMALL is full or the end-of-file is encountered. But this behaviour has been changed over the years, according to Fortran standards, and various compilers have implemented various non-standard features in this part of the language, so you may need to check your compiler's documentation or do some more experiments to figure out exactly what happens.
I think your code is also a bit peculiar in that you read from SMALL 3 times. Why ?
EDIT: OK, we're getting there. You have just discovered that Fortran stores arrays in column-major order. I believe that most other programming languages store them in row-major order. In other words, the first element of your array is small(1,1), the second (in memory) is small(2,1), the third is small(1,2) and so forth. I think that your read (and write) statements are not standard but widely implemented (which is not unusual in Fortran compilers). I may be wrong, it may be standard. Either way, the read statement is being interpreted to read the elements of small in column-major order. The first number read is put in small(1,1), the second in small(2,1), the third in small(1,2) and so on.
Your write statement makes use of the same feature; you might have discovered this for yourself if you had written out the elements in loops with the indices printed too.
The idiomatic Fortran way of reading an array and controlling the order in which elements are placed into the array, is to include an implied-do loop in the read statement, like this:
READ(UNIT=10, FMT='(5I4)', iostat=reason) ((SMALL(row,col), col = 1,numCol), row=1,numRow)
You can also use this approach in write statements.
You should also study your compiler documentation carefully and determine how to switch on warnings for all non-standard features.

Adding to what High Performance Mark wrote...
If you want to use commas to separate the numbers, then you should use list-directed IO rather than formatted IO. (Sometimes this is called format-free IO, but that non-standard term is easy to confuse with binary IO). This is easier to use since you don't have to arrange the numbers precisely in columns and can separate them with spaces or commas. The read is simply "read (10, *) variables"
But sticking to formatted IO, here is some sample code:
program demo1
implicit none
integer, dimension (2,5) :: small
integer :: irow, jcol
open ( unit=10, file='boundary.txt', access='sequential', form='formatted' )
do irow=1, ubound (small, 1)
read (10, '(5I4)') (small (irow, jcol), jcol=1, ubound (small, 2))
end do
write (*, '( / "small (1,2) =", I2, " and small (2,1)=", I2 )' ) small (1,2), small (2,1)
end program demo1
Using the I4 formatted read, the data need to be in columns:
12341234123412341234
11 12 13 14 15
21 22 23 24 25
The data file shouldn't contain the first row "1234..." -- that is in the example to make the alignment required for the format 5I4 clear.
With my example program, there is an outer do loop for irow and an "implied do loop" as part of the read statement. You could also eliminate the outer do loop and use two implied do loops on the read statement, as High Performance Mark showed. In this case, if you kept the format specification (5I4), it would get reused to read the second line -- this is called format reversion. (On a more complicated format, one needs to read the rules to understand which part of the format is reused in format reversion.) This is standard, and has been so at least since FORTRAN 77 and probably FORTRAN IV. (Of course, the declarations and style of my example are Fortran 90).
I used "ubound" so that you neither have to carry around variables storing the dimensions of the array, nor use specific numeric values. The later method can cause problems if you later decide to change the dimension of the array -- then you have to hunt down all of the specific values (here 2 and 5) and change them.
There is no need for a rewind after an open statement.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Writing large Fortran binary files with access=stream - fortran

Related

"Insufficient virtual memory" error for allocating small arrays

Reading many files in Fortran subroutine

sizeof in fortran

Unexpected "padding" in a Fortran unformatted file

Importing data from file to array

Categories

Resources