GFortran: Read file bigger than 2GB

GFortran: Read file bigger than 2GB - fortran

Does GFortran allow 8-byte integers as values for the read and inquire pos= argument?
Has GFortran an 8-byte version of ftell for getting file positions past 2GB?
The INTEL Fortran compiler has an 8-byte integer version of ftell called ftelli8 but I don't find anything regarding Gfortran.

The Fortran standard doesn't require specific integer kind as an pos argument to read. You can use any kind, including 8 bytes.
The GCC nonstandard function ftell returns kind 8 on my 64 bit system, which is an 8 byte integer in gfortran. You could easily check by a simple program
print *, kind(FTELL(6))
end
which prints 8, or
print *, bit_size(FTELL(6))
end
which prints 64.

Related

Can overindexing in FORTRAN 77 modify the program itself?

Here is a little program in FORTRAN 77
dimension totlev(20)
do 100 i=1,24
totlev(i)=0.0
write(0,*) 'totlev i=',i, totlev(i)
100 continue
end
I compile it using MinGW by typing gfortran test.f and I do get a warning (not an error):
test.f:4:14:
do 100 i=1,25
2
totlev(i)=0.0
1
Warning: Array reference at (1) out of bounds (25 > 20) in loop beginning at (2)
test.f:5:40:
test.f:3:72:
do 100 i=1,25
2
test.f:5:40:
write(0,*) 'totlev i=',i, totlev(i)
1
Warning: Array reference at (1) out of bounds (25 > 20) in loop beginning at (2)
However, not always such a warning would be produced if it was a longer program. An executable is created. When I run it it behaves like an infinite loop.
And this is my problem: How is an infinite loop even possible with the DO iteration? Isn't it a logical impossibility? My only explanation is that overindexing in this case reaches to the program code itself and changes it. Is that possible?
I use Windows 7 OS if that's relevant.

It's not changing the code, it's changing the variable i. Both the array totlev(20) and the scalar i are local variables, and thus typically stored in the program's stack frame (though the standard leaves this choice to the 'processor', Fortran-speak for implementation). In this case the compiler apparently put i 4 'real's (probably 16 bytes) after the end of totlev, so assigning to totlev(24) actually changes i. Fortran basically requires that an integer and single/default-precision real variable be the same size, and while it doesn't require any particular relationship between the representations for integers and reals, most machines today use 'IEEE 754' floating-point and in that system a real 0.0 has the same representation as an integer 0.
On many though not all computer architectures it is possible to address code by indexing an array out of range, but this almost always requires indexes far out of range: millions or billions or more, not one or two. On older architectures it was often possible both to read and write code this way, but most systems since about 1980 have memory protection so that you can't write to code. In particular all Windows NT-series systems do this, which includes Windows 7.

Writing large Fortran binary files with access=stream

I am having some problems understanding the formatting of binary files that I am writing using Fortran. I use the following subroutine to write binary files to disk:
SUBROUTINE write_field(d,m,outfile)
IMPLICIT NONE
REAL, INTENT(IN) :: d(m,m,m)
INTEGER, INTENT(IN) :: m
CHARACTER(len=256), INTENT(IN) :: outfile
OPEN(7,file=outfile,form='unformatted',access='stream')
WRITE(7) d
CLOSE(7)
END SUBROUTINE write_field
My understanding of the access=stream option was that this would suppress the standard header and footer that comes with a Fortran binary (see Fortran unformatted file format).
If I write a file with m=512 then my expectation is that the file should be 4 x 512^3 bytes = 536870912 bytes ~ 513 Mb however they are in fact 8 bytes longer than this, coming in at 536870920 bytes. My guess is that these extra bytes are the 4 byte header and footers, which I had wanted to suppress by using access='stream'.
The situation becomes confusing to me if I write a file with m=1024 then my expectation is that the file should be 4 x 1024^3 bytes = 4294967296 ~ 4.1 Gb however they are in fact 24(!) bytes longer than this, coming in at 4294967320 bytes. I do not understand why there are 24 extra bytes here, which would seem to correspond to 6(!) headers or footers.
My questions are:
(a) Is it possible to get Fortran to write a binary with no headers or footers?
(b) If the answer to (a) is 'no' then can I ensure that the larger binary has the same header and footer structure as the smaller binary?
(c) If the answers to (a) and (b) are both 'no' then how do I understand where these extra headers and footers are in the file.
I am using ifort (version 14.0.2) and I am writing the binary files on a small Linux cluster.
UPDATE: When running the same code with OSx and compiled with gfortran 7.3.0 the binary files come out with the expected sizes, as in they are always 4 x m^3 bytes, even when m=1024. So this problem seems to be related to the older compiler.
UPDATE: In fact, the problem is only present when using ifort 14.0.2 I have updated the text to reflect this.

This problem is solved by adding status='replace' in the Fortran open command. It is not to do with the compiler.
With access='stream' and without status='replace', the old binary file is not automatically replaced by the new binary file and is simply overwritten up to a certain point (https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/676047). This results in the old binary simply having bytes replaced up to the size of the new binary, while leaving any additional bytes, and the file size, unchanged. This is a problem if the new file size is smaller than the old file size. The problem difficult to diagnose because the time-stamp on the file is updated, so the file looks like it is new when queried using ls -l.
A minimal working example that recreates this problem is as follows:
PROGRAM write_binary_test_minimal
IMPLICIT NONE
REAL :: a
a=1.
OPEN(7,file='test',form='unformatted')
WRITE(7) a
CLOSE(7)
OPEN(7,file='test',form='unformatted',access='stream')
WRITE(7) a
CLOSE(7)
END PROGRAM write_binary_test_minimal
The first write generates a file 'test' of size 8 + 4 = 12 bytes. Where the 8 is the standard Fortran-binary header and footer and the 4 is the size in bytes of a. In the second write statement, even though access='stream' has been set, only the first 4 bytes of the previously-generated 'test' are overwritten, leaving the file as size 12 bytes! The solution to this is to change the second write statement to
OPEN(7,file='test',form='unformatted',access='stream',status='replace')
with an explicit status='replace' to ensure the old file is replaced.

How to format an integer to have only the needed size?

I have been experimenting with the following code:
program hello
write(*,"(i9)") 10
end program hello
and varying the format string, trying to make write output a string just the size needed to represent the integer number, but so far I was unable to manage it. How to write 'fit' integers in Fortran?

A I0 edit descriptor is the correct way for output of integers with the correct width. This was introduced in Fortran 95. All current Fortran compilers which were Fortran 90 compilers have been updated to Fortran 95 years ago.

lseek with 2 gb file in windows [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Question about file seeking position
I am facing one problem related to lseek(). It returns failure in case when we are trying to access 2GB+ file in windows (32 bit machine). Is there any limit upto which lseek can set the file pointer in the file which we are using???
offset valye is 2154654555.
Compiler Details
c:\Program Files\Inno Setup 5\Compil32.exe

You should have a look at _lseeki64, which takes 64-bit offsets. lseek() (and its successor, _lseek()) are limited to signed 32-bit offsets, which have an upper limit of 2147483647. Your offset of 215465455 exceeds that (and would be treated as a negative number if stored in a long). See http://msdn.microsoft.com/en-us/library/1yee101t. (You'll need something comparable for your compiler.)

The maximum value of off_t is 2147483647, where off_t is the the type for the offset in lseek()

lseek doesn't work with files greater than 2 GB, because the offset input is on a 32 bit variable which cannot take value greater than 2147483647. In many OS it is either supported through compile time macros or by providing alternative functions.
You can try _lseeki64 in case of MSVC Compiler. It takes 64 bit variable for offset. Since you are not using MSVC, you can check for equivalent function.

Fortran I/O: Specifying large record sizes

I am trying to write an array to file, where I have opened the file this way:
open(unit=20, FILE="output.txt", form='unformatted', access='direct', recl=sizeof(u))
Here, u is an array and sizeof(u) is 2730025920, which is ~2.5GB.
When I run the program, I get an error Fortran runtime error: RECL parameter is non-positive in OPEN statement, which I believe means that the record size is too large.
Is there a way to handle this? One option would be to write the array in more than one write call such that the record size in each write is smaller than 2.5GB. But I am wondering if I can write the entire array in a single call.
Edit:
u has been declared as double precision u(5,0:408,0:408,0:407)
The program was compiled as gfortran -O3 -fopenmp -mcmodel=medium test.f
There is some OpenMP code in this program, but the file I/O is sequential.
gfortran v 4.5.0, OS: Opensuse 11.3 on 64 bit AMD Opteron
Thanks for your help.

You should be able to write big arrays as long as it's memory permitting. It seems like you are getting integer overflow with the sizeof function. sizeof is not Fortran standard and I would not recommend using it (implementations may vary between compilers). Instead, it is a better practice to use the inquire statement to obtain record length. I was able to reproduce your problem with ifort and this solution works for me. You can avoid integer overflow by declaring a higher kind variable:
integer(kind=8) :: reclen
inquire(iolength=reclen)u
open(unit=20,file='output.txt',form='unformatted',&
access='direct',recl=reclen)
EDIT: After some investigation, this seems to be a gfortran problem. Setting a higher kind for integer reclen solves the problem for ifort and pgf90, but not for gfortran - I just tried this with version 4.6.2. Even though reclen has the correct positive value, it seems that recl is 32-bit signed integer internally with gfortran (Thanks #M.S.B. for pointing this out). The Fortran run-time error suggests this, and not that the value is larger than maximum. I doubt it is an OS issue. If possible, try using ifort (free for non-commercial use): Intel Non-Commercial Software Download.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GFortran: Read file bigger than 2GB - fortran

Does GFortran allow 8-byte integers as values for the read and inquire pos= argument? Has GFortran an 8-byte version of ftell for getting file positions past 2GB? The INTEL Fortran compiler has an 8-byte integer version of ftell called ftelli8 but I don't find anything regarding Gfortran.

Related

Can overindexing in FORTRAN 77 modify the program itself?

Writing large Fortran binary files with access=stream

How to format an integer to have only the needed size?

lseek with 2 gb file in windows [duplicate]

Fortran I/O: Specifying large record sizes

Categories

Resources