I'm trying to port a code from ifort compiler to ibm xlf compiler. It works well under ifort on redhat but give results contain "NaNQ" under xlf on AIX system. It turns out that there is a array bounds reading in the code cost this problem, here is a simplified example:
program main
implicit none
real(8)::a(1,0:10)=0.D0
print *, a(1,-1)
end program main
Using both compiler I can successfully compile it, without any mistake or warning.
On ifort I get result:
0.000000000000000E+000
But on xlf, I get:
0.247032822920623272E-322
However, if I read more beyond the boundary, the xlf won't compile but ifort compile successfully.
program main
implicit none
real(8)::a(1,0:10)=0.D0
print *, a(1,-3:-1)
end program main
On ifort I get:
0.000000000000000E+000 0.000000000000000E+000 0.000000000000000E+000
On xlf it won't compile:
"1.f90", line 5.9: 1516-023 (S) Subscript is out of bounds.
** main === End of Compilation 1 ===
1501-511 Compilation failed for file 1.f90.
Why the ifort and xlf take differently on this cross boundary read? Is there any way to make the compiler to check it strictly and prevent cross boundary read to happen? After all, it took me a long time to catch this bug in our code, since our group have been using this code for more than 15 years without any problems on ifort. Thanks.
Most Fortran compilers have options to check for array bounds errors at runtime. In these examples with constant indices the error can be found at compile time, which some compilers but not others will do without using non-default options. With ifort use -check bounds to request array bounds checking. You can get additional checking with -check all. These options are generally not the default because there is a runtime cost. But the cost of getting a wrong answer can be much higher!! I have found the runtime cost to frequently be surprisingly low and recommend using runtime checks during code development and even in production if the runtime cost is acceptable.
Related
So, I want to help my researchers a bit with debugging Fortran programs, and for demonstration purposes I created a program that intentionally causes a segfault.
Here's the source:
program segfault
implicit none
integer :: n(10), i
integer :: ios, u
open(newunit=u, file='data.txt', status='old', action='read', iostat=ios)
if (ios /= 0) STOP "error opening file"
i = 0
do
i = i + 1
read(u, *, iostat=ios) n(i)
if (ios /= 0) exit
end do
close(u)
print*, sum(n)
end program segfault
The data.txt file contains 100 random numbers:
for i in {1..100}; do
echo $RANDOM >> data.txt;
done
When I compile this program with
gfortran -O3 -o segfault.exe segfault.f90
the resulting executable dutifully crashes. But when I compile with debugging enabled:
gfortran -O0 -g -o segfault.exe segfault.f90
Then it reads in only the first 10 values, and prints their sum. For what it's worth, -O2 causes the desired segfault, -O1 does not.
I find this deeply concerning. After all, how can I debug properly if the bug goes away when I compile with debugging symbols enabled?
Can someone explain this behaviour?
I am using GNU Fortran (MacPorts gcc5 5.3.0_1) 5.3.0
A segfault is an undefined behaviour. The program does not conform to the Fortran standard so you cannot expect any particular outcome. It can do anything at all. You cannot count with a segfault to happen, the less be deeply concerned whent it does not happen.
There are compiler checks (fcheck=) and sanitizations (-fsanitize=) available for a reason. Waiting for a segfault is not guaranteed to work. Not in Fortran, not in C, not in any similar language.
The outcome of a non-conforming program may depend on many things like placement of a variable in memory or in a register. Aligning of variables in memory, position of stack frames... You can't count with anything at all. These details obviously depend on the optimization level.
If the program accesses an array out of bounds, but the address in memory happens to be a part of memory which still belongs to the process, a segfault may not happen. It is just some bytes in memory which the process is allowed to read or write to (or both). You may be overwriting some other variable, you may be reading some garbage from some old stack frame, you may be overwriting malloc's internal book-keeping data and currupting the heap. The crash may be waiting to happen somewhere else or maybe just the numeric result of the program will be slightly wrong. Anything can happen.
I'm trying to compile some old fortran77 programs with gfortran and getting error with allocatable arrays.
If I define arrays in f90-style, like:
REAL*8,allocatable::somearray(:)
everything is fine, but in those old programs arrays defined as:
REAL*8 somearray[ALLOCATABLE](:)
which cause gfortran error output:
REAL*8,allocatable::somearray[ALLOCATABLE](:)
1
Fatal Error: Coarrays disabled at (1), use -fcoarray= to enable
I really wish to avoid rewriting whole programs to f90 style, so, could you please tell me, is there any way to force gfortran to compile it?
Thanks a lot.
For standard checking you can use -std flag
-std=std
Specify the standard to which the program is expected to conform, which may be one of f95',f2003', f2008',gnu', or `legacy'.
To "force" gfortran to compile your code, you have to use syntax it recognizes
I'd probably go for search and replace. For example,
sed 's/\(REAL\*8\)[[:blank:]]\+\([^[]\+\)\[ALLOCATABLE\]\(.*\)/\1, allocatable :: \2\3/' <old.source> > <new.source>
where sed is available.
Of course, be careful with sed :).
In any case, as it seems your code was written in some non-standard version of old
Fortran, you'll probably need to make changes in any case.
For what it's worth the Intel Fortran compiler (v13.something) compiles the following micro-program without complaint. This executes and writes 10 to the terminal:
REAL*8 somearray[ALLOCATABLE](:)
allocate(somearray(10))
print *, size(somearray)
end
Given the history of the Intel compiler I suspect that the strange declaration is an extension provided by DEC Fortran, possibly an early implementation of what was later standardised in Fortran 90.
This program crashes with Illegal instruction: 4 on MacOSX Lion and ifort (IFORT) 12.1.0 20111011
program foo
real, pointer :: a(:,:), b(:,:)
allocate(a(5400, 5400))
allocate(b(5400, 3600))
a=1.0
b(:, 1:3600) = a(:, 1:3600)
print *, a
print *, b
deallocate(a)
deallocate(b)
end program
The same program works with gfortran. I don't see any problem. Any ideas ? Unrolling the copy and performing the explicit loop over the columns works in both compilers.
Note that with allocatable instead of pointer I have no problems.
The behavior is the same if the statement is either inside a module or not.
I confirm the same behavior on ifort (IFORT) 12.1.3 20120130.
Apparently, no problem occurs with Linux and ifort 12.1.5
I tried to increase the stack size with the following linking options
ifort -Wl,-stack_size,0x40000000,-stack_addr,0xf0000000 test.f90
but I still get the same error. Increasing ulimit -s to hard same problem.
Edit 2: I did some more debugging and apparently the problem happens when the array splicing operation
b(:, 1:3600) = a(:, 1:3600)
involves a value suspiciously close to 16 M of data.
I am comparing the opcodes produced, but if there is a way to see an intermediate code form that is more communicative, I'd gladly appreciate it.
Your program is correct (though I would prefer allocatable to pointer if you do not need to be able to repoint it). The problem is that ifort by default places all array temporaries on the stack, no matter how large they are. And it seems to need an array temporary for the copy operation you are doing here. To work around ifort's stupid default behavior, always use the -heap-arrays flag when compiling. I.e.
ifort -o test test.f90 -heap-arrays 1600
The number behind -heap-arrays is the threshold where it should begin using the heap. For sizes below this, the stack is used. I chose a pretty low number here - you can probably safely use higher ones. In theory stack arrays are faster, but the difference is usually totally negligible. I wish intel would fix this behavior. Every other compiler has sensible defaults for this setting.
Use "allocatable" instead of "pointer".
real, allocatable :: a(:,:), b(:,:)
Assigning a floating point number to a pointer looks dubious to me.
Recently, I read a post on Stack Overflow about finding integers that are perfect squares. As I wanted to play with this, I wrote the following small program:
PROGRAM PERFECT_SQUARE
IMPLICIT NONE
INTEGER*8 :: N, M, NTOT
LOGICAL :: IS_SQUARE
N=Z'D0B03602181'
WRITE(*,*) IS_SQUARE(N)
NTOT=0
DO N=1,1000000000
IF (IS_SQUARE(N)) THEN
NTOT=NTOT+1
END IF
END DO
WRITE(*,*) NTOT ! should find 31622 squares
END PROGRAM
LOGICAL FUNCTION IS_SQUARE(N)
IMPLICIT NONE
INTEGER*8 :: N, M
! check if negative
IF (N.LT.0) THEN
IS_SQUARE=.FALSE.
RETURN
END IF
! check if ending 4 bits belong to (0,1,4,9)
M=IAND(N,15)
IF (.NOT.(M.EQ.0 .OR. M.EQ.1 .OR. M.EQ.4 .OR. M.EQ.9)) THEN
IS_SQUARE=.FALSE.
RETURN
END IF
! try to find the nearest integer to sqrt(n)
M=DINT(SQRT(DBLE(N)))
IF (M**2.NE.N) THEN
IS_SQUARE=.FALSE.
RETURN
END IF
IS_SQUARE=.TRUE.
RETURN
END FUNCTION
When compiling with gfortran -O2, running time is 4.437 seconds, with -O3 it is 2.657 seconds. Then I thought that compiling with ifort -O2 could be faster since it might have a faster SQRT function, but it turned out running time was now 9.026 seconds, and with ifort -O3 the same. I tried to analyze it using Valgrind, and the Intel compiled program indeed uses many more instructions.
My question is why? Is there a way to find out where exactly the difference comes from?
EDITS:
gfortran version 4.6.2 and ifort version 12.0.2
times are obtained from running time ./a.out and is the real/user time (sys was always almost 0)
this is on Linux x86_64, both gfortran and ifort are 64-bit builds
ifort inlines everything, gfortran only at -O3, but the latter assembly code is simpler than that of ifort, which uses xmm registers a lot
fixed line of code, added NTOT=0 before loop, should fix issue with other gfortran versions
When the complex IF statement is removed, gfortran takes about 4 times as much time (10-11 seconds). This is to be expected since the statement approximately throws out about 75% of the numbers, avoiding to do the SQRT on them. On the other hand, ifort only uses slightly more time. My guess is that something goes wrong when ifort tries to optimize the IF statement.
EDIT2:
I tried with ifort version 12.1.2.273 it's much faster, so looks like they fixed that.
What compiler versions are you using?
Interestingly, it looks like a case where there is a performance regression from 11.1 to 12.0 -- e.g. for me, 11.1 (ifort -fast square.f90) takes 3.96s, and 12.0 (same options) took 13.3s.
gfortran (4.6.1) (-O3) is still faster (3.35s).
I have seen this kind of a regression before, although not quite as dramatic.
BTW, replacing the if statement with
is_square = any(m == [0, 1, 4, 9])
if(.not. is_square) return
makes it run twice as fast with ifort 12.0, but slower in gfortran and ifort 11.1.
It looks like part of the problem is that 12.0 is overly aggressive in trying to vectorize things: adding
!DEC$ NOVECTOR
right before the DO loop (without changing anything else in the code) cuts the run time down to 4.0 sec.
Also, as a side benefit: if you have a multi-core CPU, try adding -parallel to the ifort command line :)
I am trying to write an array to file, where I have opened the file this way:
open(unit=20, FILE="output.txt", form='unformatted', access='direct', recl=sizeof(u))
Here, u is an array and sizeof(u) is 2730025920, which is ~2.5GB.
When I run the program, I get an error Fortran runtime error: RECL parameter is non-positive in OPEN statement, which I believe means that the record size is too large.
Is there a way to handle this? One option would be to write the array in more than one write call such that the record size in each write is smaller than 2.5GB. But I am wondering if I can write the entire array in a single call.
Edit:
u has been declared as double precision u(5,0:408,0:408,0:407)
The program was compiled as gfortran -O3 -fopenmp -mcmodel=medium test.f
There is some OpenMP code in this program, but the file I/O is sequential.
gfortran v 4.5.0, OS: Opensuse 11.3 on 64 bit AMD Opteron
Thanks for your help.
You should be able to write big arrays as long as it's memory permitting. It seems like you are getting integer overflow with the sizeof function. sizeof is not Fortran standard and I would not recommend using it (implementations may vary between compilers). Instead, it is a better practice to use the inquire statement to obtain record length. I was able to reproduce your problem with ifort and this solution works for me. You can avoid integer overflow by declaring a higher kind variable:
integer(kind=8) :: reclen
inquire(iolength=reclen)u
open(unit=20,file='output.txt',form='unformatted',&
access='direct',recl=reclen)
EDIT: After some investigation, this seems to be a gfortran problem. Setting a higher kind for integer reclen solves the problem for ifort and pgf90, but not for gfortran - I just tried this with version 4.6.2. Even though reclen has the correct positive value, it seems that recl is 32-bit signed integer internally with gfortran (Thanks #M.S.B. for pointing this out). The Fortran run-time error suggests this, and not that the value is larger than maximum. I doubt it is an OS issue. If possible, try using ifort (free for non-commercial use): Intel Non-Commercial Software Download.