This program crashes with Illegal instruction: 4 on MacOSX Lion and ifort (IFORT) 12.1.0 20111011
program foo
real, pointer :: a(:,:), b(:,:)
allocate(a(5400, 5400))
allocate(b(5400, 3600))
b(:, 1:3600) = a(:, 1:3600)
print *, a
print *, b
end program
The same program works with gfortran. I don't see any problem. Any ideas ? Unrolling the copy and performing the explicit loop over the columns works in both compilers.
Note that with allocatable instead of pointer I have no problems.
The behavior is the same if the statement is either inside a module or not.
I confirm the same behavior on ifort (IFORT) 12.1.3 20120130.
Apparently, no problem occurs with Linux and ifort 12.1.5
I tried to increase the stack size with the following linking options
ifort -Wl,-stack_size,0x40000000,-stack_addr,0xf0000000 test.f90
but I still get the same error. Increasing ulimit -s to hard same problem.
Edit 2: I did some more debugging and apparently the problem happens when the array splicing operation
b(:, 1:3600) = a(:, 1:3600)
involves a value suspiciously close to 16 M of data.
I am comparing the opcodes produced, but if there is a way to see an intermediate code form that is more communicative, I'd gladly appreciate it.

Your program is correct (though I would prefer allocatable to pointer if you do not need to be able to repoint it). The problem is that ifort by default places all array temporaries on the stack, no matter how large they are. And it seems to need an array temporary for the copy operation you are doing here. To work around ifort's stupid default behavior, always use the -heap-arrays flag when compiling. I.e.
ifort -o test test.f90 -heap-arrays 1600
The number behind -heap-arrays is the threshold where it should begin using the heap. For sizes below this, the stack is used. I chose a pretty low number here - you can probably safely use higher ones. In theory stack arrays are faster, but the difference is usually totally negligible. I wish intel would fix this behavior. Every other compiler has sensible defaults for this setting.

Use "allocatable" instead of "pointer".
real, allocatable :: a(:,:), b(:,:)
Assigning a floating point number to a pointer looks dubious to me.


Reading real*8 variable with value 0 with real*4 results a large number in fortran without warning

Reading real*4 variable with value 0 with real*8 results a large number, sometimes without warning.
I'm not good at Fortran. I was just running a Fortran code I got from someone else, and it made a segmentation fault. While I was debugging it, I found that one of the subroutines is reading a variable with value 0 defined with real*8 as real*4 results a large value.
I tried to reproduce it with simple code, but compiler showed a warning for the argument mismatch. I had to nest codes to reproduce the suppressed warning in simple code, but I'm not sure what's the exact condition for suppressed warning.
Actually, for some reason, I'm suspecting it may be the problem of my compiler, as the code (not the example code, original code) ran fine on the PC of the person who gave me the code.
file hello.f:
implicit none
call sdo()
file test.f:
subroutine sdo()
implicit none
real*4 dsecs
write(0,*) dsecs
call sd(dsecs)
file test2.f:
subroutine sd(dsecs)
implicit none
real*8 dsecs
write(0,*) dsecs
compilation and execution:
$ gfortran -o hello hello.f test.f test2.f
$ ./hello
Expected result:
0. 00000000
0. 0000000000000000
Actual results:
0. 00000000
It is not the problem of the compiler. It is the problem of the code. Your code did issue a warning for me that you were doing something nefarious, as it should. The subroutine that thinks dsecs is 4 bytes long sent 4 bytes. The subroutine that thinks dsecs is 8 bytes long looked at 8 bytes. What's in the other 4 bytes? Who knows. How does it look like when the two get mixed together? Probably not what you want. It's like accidentally getting served a scoopful of half icecream and half garbage: unlikely to taste the way you thought.
This is one of those problems that are very simply solved with that classic joke: "Doctor, doctor, it hurts when I do this!" - "Then... don't do that."
EDIT: Sorry, I cheated. I didn't compile them as separate programs. When I do, I don't get warnings. This is also normal - at compilation step, you didn't specify how foreign subroutines look so it couldn't complain, and at linking step compiler doesn't check any more.

Segmentation fault for array, but only if a component of a derived type

Pretty simple setup, using gfortran 4.8.5 on linux (red hat):
I get a segfault if my array of reals (inside a derived type) has size > 2,000,000. This seems to be a standard stack/heap issue as my stack size is 8mb if I check with ulimit.
There is no problem if the array is NOT inside a derived type
Note that as #francescalus guesses, removing the initial value = 0.0 eliminates the problem
Edit to add: Note that I have posted a followup question Segmentation fault related to component of derived type that represents a more realistic use case and further narrows down the conditions under which this seems to occur.
program main
call sub1 ! seg fault if col size > 2,100,000
call sub2 ! works fine at col size = 100,000,000
end program main
subroutine sub1
type table
real :: col(2100000) = 0.0 ! works if "= 0.0" removed
end type table
type(table) :: table1
table1%col = 1.0
end subroutine sub1
subroutine sub2
real :: col(100000000) = 0.0
col = 1.0
end subroutine sub2
Some obvious questions here:
Is this expected behavior, or some bug that was fixed in newer versions of gfortran?
Am I following standard fortran operating procedures here, or doing something wrong?
What is the recommended way to avoid this (please assume that I am unable to update to a newer version of gfortran in the near term)? I will almost certainly solve with an allocatable array component for reasons not specific to this question, but that might not be an ideal general solution and I would like to know of all good options I have here.
In particular, is initializing the components of a derived type bad practice?
This is likely to be a runtime issue due to insufficient stack, rather than a bug with gfortran.
Gfortran uses the stack to store automatic arrays and other initialization data. When code does not create problems when one such array is small, but segfaults when the size of the array increases, a possible reason is running out of stack.
The issue seems to be the same in more recent versions of gfortran. I compiled and ran your program with gfortran 4.8.4, 4.9.3, 5.5.0, 6.4.0, 7.3.0 and 8.2.0. In all cases I obtained a segmentation fault with the default stack size, but no error when the stack size was slightly increased.
$ ./sfa
Segmentation fault
$ ulimit -s
$ ulimit -s 8256
$ ./sfa && echo "DONE"
Your problem may be solved by running
$ ulimit -s unlimited
before executing your binary. I am not aware of any particular penalty for doing this, but programmers more aware of the fine details of memory management, such as compiler developers, may think otherwise.
Initializing the components of a derived type is not bad practice, but as you can see, it can create problems with the stack if the component is a big array - be it due to the storage of the component itself, or to the storage of memory to work on the RHS of the assignment. If the component is made allocatable and allocated in a subroutine, the array is stored in the heap rather than in the stack, and this issue is usually avoided. In this case, it may be about actually setting the values of the array dynamically in a subroutine rather than at compile time. It may be less elegant, but I think it's worth it, since it's the typical example of code development work that prevents avoidable, environment-related errors when executing the binary.
Your code above is standards compliant. As explained in the comments, lack of explicit interfaces for subroutines is not good practice, but for these simple subroutines it's not against the rules.
Some compilers have flags that allow you to change where some objects are allocated in memory. While it may fix a particular issue, flags are compiler dependent, and usually not equivalent when comparing different compilers. Using dynamic memory via allocatables is a more robust solution, according to my experience.
Finally, note that, if you are using OpenMP, the ulimit command above only affects the master thread - you need to set the stack size of each of the other threads via the environment variable OMP_STACKSIZE, which cannot be unlimited. And bear in mind that non-master threads running out of stack are a problem much more difficult to diagnose, since the binary may stop without a proper Segmentation fault error.
These are not necessarily useful solutions, but below are some conditions under which the seg fault disappears. A couple of people mentioned the lack of an explicit interface (as bad practice though not technically incorrect), and it seems that this might be one key here as either of these two changes to the code gets rid of the seg fault, although it's not quite that simple, as I'll explain:
Put everything in main, with no subroutine calls
Put the type definition table in a module
Let me expand on #2 briefly. Simply taking the example in the OP and then giving it an explicit interface by putting the subroutine in a module does NOT work. However, if I put the type definition in a module and then use it (as shown below) the segfault does not occur:
program main
use table_mod
type(table) :: table1
table1%col = 1.0
end program main

SEGFAULT disappears with lower optimisation level?

So, I want to help my researchers a bit with debugging Fortran programs, and for demonstration purposes I created a program that intentionally causes a segfault.
Here's the source:
program segfault
implicit none
integer :: n(10), i
integer :: ios, u
open(newunit=u, file='data.txt', status='old', action='read', iostat=ios)
if (ios /= 0) STOP "error opening file"
i = 0
i = i + 1
read(u, *, iostat=ios) n(i)
if (ios /= 0) exit
end do
print*, sum(n)
end program segfault
The data.txt file contains 100 random numbers:
for i in {1..100}; do
echo $RANDOM >> data.txt;
When I compile this program with
gfortran -O3 -o segfault.exe segfault.f90
the resulting executable dutifully crashes. But when I compile with debugging enabled:
gfortran -O0 -g -o segfault.exe segfault.f90
Then it reads in only the first 10 values, and prints their sum. For what it's worth, -O2 causes the desired segfault, -O1 does not.
I find this deeply concerning. After all, how can I debug properly if the bug goes away when I compile with debugging symbols enabled?
Can someone explain this behaviour?
I am using GNU Fortran (MacPorts gcc5 5.3.0_1) 5.3.0
A segfault is an undefined behaviour. The program does not conform to the Fortran standard so you cannot expect any particular outcome. It can do anything at all. You cannot count with a segfault to happen, the less be deeply concerned whent it does not happen.
There are compiler checks (fcheck=) and sanitizations (-fsanitize=) available for a reason. Waiting for a segfault is not guaranteed to work. Not in Fortran, not in C, not in any similar language.
The outcome of a non-conforming program may depend on many things like placement of a variable in memory or in a register. Aligning of variables in memory, position of stack frames... You can't count with anything at all. These details obviously depend on the optimization level.
If the program accesses an array out of bounds, but the address in memory happens to be a part of memory which still belongs to the process, a segfault may not happen. It is just some bytes in memory which the process is allowed to read or write to (or both). You may be overwriting some other variable, you may be reading some garbage from some old stack frame, you may be overwriting malloc's internal book-keeping data and currupting the heap. The crash may be waiting to happen somewhere else or maybe just the numeric result of the program will be slightly wrong. Anything can happen.

Fortran I/O: Specifying large record sizes

I am trying to write an array to file, where I have opened the file this way:
open(unit=20, FILE="output.txt", form='unformatted', access='direct', recl=sizeof(u))
Here, u is an array and sizeof(u) is 2730025920, which is ~2.5GB.
When I run the program, I get an error Fortran runtime error: RECL parameter is non-positive in OPEN statement, which I believe means that the record size is too large.
Is there a way to handle this? One option would be to write the array in more than one write call such that the record size in each write is smaller than 2.5GB. But I am wondering if I can write the entire array in a single call.
u has been declared as double precision u(5,0:408,0:408,0:407)
The program was compiled as gfortran -O3 -fopenmp -mcmodel=medium test.f
There is some OpenMP code in this program, but the file I/O is sequential.
gfortran v 4.5.0, OS: Opensuse 11.3 on 64 bit AMD Opteron
Thanks for your help.
You should be able to write big arrays as long as it's memory permitting. It seems like you are getting integer overflow with the sizeof function. sizeof is not Fortran standard and I would not recommend using it (implementations may vary between compilers). Instead, it is a better practice to use the inquire statement to obtain record length. I was able to reproduce your problem with ifort and this solution works for me. You can avoid integer overflow by declaring a higher kind variable:
integer(kind=8) :: reclen
EDIT: After some investigation, this seems to be a gfortran problem. Setting a higher kind for integer reclen solves the problem for ifort and pgf90, but not for gfortran - I just tried this with version 4.6.2. Even though reclen has the correct positive value, it seems that recl is 32-bit signed integer internally with gfortran (Thanks #M.S.B. for pointing this out). The Fortran run-time error suggests this, and not that the value is larger than maximum. I doubt it is an OS issue. If possible, try using ifort (free for non-commercial use): Intel Non-Commercial Software Download.

Stack overflow in Fortran 90

I have written a fairly large program in Fortran 90. It has been working beautifully for quite a while, but today I tried to step it up a notch and increase the problem size (it is a research non-standard FE-solver, if that helps anyone...) Now I get the "stack overflow" error message and naturally the program terminates without giving me anything useful to work with.
The program starts with setting up all relevant arrays and matrices, and after that is done it prints a few lines of stats regarding this to a log-file. Even with my new, larger problem, this works fine (albeit a little slow), but then it fails as the "number crunching" gets going.
What confuses me is that everything at that point is already allocated (and that worked without errors). I'm not entirely sure what the stack is (Wikipedia and several treads here didn't do much since I have only a quite basic knowledge of the "behind the scenes" workings of a computer).
Assume that I for instance have some arrays initialized as:
which after some initialization routines (i.e. read input from file and such) are allocated as (I store some size-integers for easier passing to subroutines in IA of fixed size):
ALLOCATE( AA(N1,N2) , BB(N1,N2) )
IA(1) = N1
IA(2) = N2
This is basically what happens in the initial portion, and so far so good. But when I then call a subroutine
And the routine looks like (nothing fancy):
do lots of other stuff
Now I get an error! The output to the screen says:
forrtl: severe (170): Program Exception - stack overflow
However, when I run the program with the debugger it breaks at line 419 in a file called winsig.c (not my file, but probably part of the compiler?). It seems to be part of a routine called sigreterror: and it is the default case that has been invoked, returning the text Invalid signal or error. There is a comment line attached to this which strangely says /* should never happen, but compiler can't tell */ ...?
So I guess my question is, why does this happen and what is actually happening? I thought that as long as I can allocate all the relevant memory I should be fine? Does the call to the subroutine make copies of the arguments, or just pointers to them? If the answer is copies then I can see where the problem might be, and if so: any ideas on how to get around it?
The problem I try to solve is big, but not insane in any way. Standard FE-solvers can handle bigger problems than my current one. I run the program on a Dell PowerEdge 1850 and the OS is Microsoft Server 2008 R2 Enterprise. According to systeminfo at the cmd prompt I have 8GB of physical memory and almost 16GB virtual. As far as I understand the total of all my arrays and matrices should not add up to more than maybe 100MB - about 5.5M integer(4) and 2.5M real(8) (which according to me should be only about 44MB, but let's be fair and add another 50MB for overhead).
I use the Intel Fortran compiler integrated with Microsoft Visual Studio 2008.
Adding some actual source code to clarify a bit
! Update continuum state
CALL UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,&
is the actual call to the routine. Big arrays are posc, bmtrx and aa - all other are at least an order of magnitude smaller (if not more). posc is INTEGER(4) and bmtrx and aa is REAL(8)
SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
INTEGER(4) :: iTask, errmsg
INTEGER(4) :: iArray(64)
INTEGER(4),DIMENSION(iArray(15),iArray(15),iArray(5)) :: posc
INTEGER(4),DIMENSION(iArray(22),iArray(21)+1) :: nodedof
INTEGER(4),DIMENSION(iArray(29),iArray(3)+2) :: elm
REAL(8),DIMENSION(iArray(14)) :: dof, dof_k
REAL(8),DIMENSION(iArray(12)*iArray(17),iArray(15)*iArray(5)) :: bmtrx
REAL(8),DIMENSION(iArray(5)*iArray(17)) :: detjac
REAL(8),DIMENSION(iArray(17)) :: w
REAL(8),DIMENSION(iArray(23),iArray(19)) :: mtrlprops
REAL(8),DIMENSION(iArray(8),iArray(8),iArray(23)) :: demtrx
REAL(8) :: dt
REAL(8),DIMENSION(2,iArray(12)*iArray(17)*iArray(5)) :: stress
REAL(8),DIMENSION(iArray(12)*iArray(17)*iArray(5)) :: strain
REAL(8),DIMENSION(2,iArray(17)*iArray(5)) :: effstrain, effstress
REAL(8),DIMENSION(iArray(25)) :: aa
REAL(8),DIMENSION(iArray(14)) :: fi
INTEGER(4) :: i, e, mtrl, i1, i2, j1, j2, k1, k2, dim, planetype, elmnodes, &
Nec, elmpnodes, Ndisp, Nstr, Ncomp, Ngpt, Ndofelm
INTEGER(4),DIMENSION(iArray(15)) :: doflist
REAL(8),DIMENSION(iArray(12)*iArray(17),iArray(15)) :: belm
REAL(8),DIMENSION(iArray(17)) :: jelm
REAL(8),DIMENSION(iArray(12)*iArray(17)*iArray(5)) :: dstrain
REAL(8),DIMENSION(iArray(12)*iArray(17)) :: s
REAL(8),DIMENSION(iArray(17)) :: ep, es, dep
REAL(8),DIMENSION(iArray(15),iArray(15)) :: kelm
REAL(8),DIMENSION(iArray(15)) :: felm
dim = iArray(1)
And it fails before the last line above.
As per steabert's request, I'll just summarize the conversation in the comments here where it's a bit more visible, even though M.S.B.'s answer already gets right to the nub of the problem.
In technical programming, where procedures often have large local arrays for intermediate computation, this happens a lot. Local variables are generally stored on the stack, which typically (and quite reasonably) a small fraction of overall system memory -- usually of order 10MB or so. When the local variable sizes exceed the stack size, you see exactly the symptoms described here -- a stack overflow occuring after a call to the relevant subroutine but before its first executable statement.
So when this problem happens, the best thing to do is to find the relevant large local variables, and decide what to do. In this case, at least the variables belm and dstrain were getting quite sizable.
Once the variables are located, and you've confirmed that's the problem, there's a few options. As MSB points out, if you can make your arrays smaller, that's one option. Alternatively, you can make the stack size larger; under linux, that's done with ulimit -s [newsize]. That really just postpones the problem, though, and you have to do something different on windows machines.
The other class of ways to avoid this problem is not to put the large data on the stack, but in the rest of memory (the "heap"). You can do that by giving the arrays the save attribute (in C, static); this puts the variable on the heap and thus makes the values persistent between calls. The downside there is that this potentially changes the behavior of the subroutine, and means the subroutine can't be used recursively, and similarly is non-threadsafe (if you're ever in a position where multiple threads will enter the routine simulatneously, they'll each see the same copy of the local varaiable and potentially overwrite each other's results). The upside is that it's easy and very portable -- it should work everywhere. However, this will only work with fixed-size local variables; if the temporary arrays have sizes that depend on the inputs, you can't do this (since there'd no longer be a single variable to save; it could be different size every time the procedure is called).
There are compiler-specific options which put all arrays (or all arrays of larger than some given size) on the heap rather than on the stack; every Fortran compiler I know has an option for this. For ifort, used in the OPs post, it's -heap-arrays in linux, or /heap-arrays for windows. For gfortran, this may actually be the default. This is good for making sure you know what's going on, but it means you have to have different incantations for every compiler to make sure your code works.
Finally, you can make the offending arrays allocatable. Allocated memory goes on the heap; but the variable which points to them is on the stack, so you get the benefits of both approaches. Also, this is completely standard fortran and so totally portable. The downside is that it requires code changes. Also, the allocation process can take nontrivial amounts of time; so if you're going to be calling the routine zillions of times, you may notice this slows things down slightly. (This possible performance regression is easy to fix, though; if you'll be calling it zillions of times with the same size arrays, you can have an optional argument to pass in a pre-allocated local array and use that instead, so that you only allocate/deallocate once).
Allocating/deallocating each time would look like:
SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
REAL(8),DIMENSION(:,:), allocatable :: belm
REAL(8),DIMENSION(:), allocatable :: dstrain
!... work
Note that if the subroutine does a lot of work (eg, takes seconds to execute), the overhead from a couple allocate/deallocates should be negligable. If not, and you want to avoid the overhead, using the optional arguments for preallocated worskpace would look something like:
SUBROUTINE UpdateContinuumState(iTask,iArray,posc,dof,dof_k,nodedof,elm,bmtrx,&
real(8),dimension(:,:), optional, target :: workbelm
real(8),dimension(:), optional, target :: workdstrain
REAL(8),DIMENSION(:,:), pointer :: belm
REAL(8),DIMENSION(:), pointer :: dstrain
if (present(workbelm)) then
belm => workbelm
if (present(workdstrain)) then
dstrain => workdstrain
!... work
if (.not.(present(workbelm))) deallocate(belm)
if (.not.(present(workdstrain))) deallocate(dstrain)
Not all of the memory is created when the program starts. When you call the subroutine the executable is creating the memory that the subroutine needs for local variables. Typically arrays with simple declarations that are local to that subroutine -- neither allocatable, nor pointer -- are allocated on the stack. You could have simply run of of stack space when you reached these declarations. You might have reached a 2GB limit on a 32-bit OS with some array. Sometimes executable statements implicitly create a temporary array on the stack.
Possible solutions: 1) make your arrays smaller (not attractive), 2) make the stack larger), 3) some compilers have options to switch from placing arrays on the stack to dynamically allocating them, similar to the method used for "allocate", 4) identify large arrays and make them allocatable.
The stack is the memory area where the information needed to return from a function, and the information locally defined in a function is stored. So a stack overflow may indicate you have a function that calls another function which in its turn calls another function, etc.
I am not familiar with Fortran (anymore) but another cause might be that those functions declare tons of local variables, or at least variables that need a lot of place.
A last one: the stack is typically rather small, so it's not a priori relevant how much memory the machine has. It should be quite simple to instruct the linker to increase the stack size, at least if you are certain it's just a lack of space, and not a bug in your application.
Edit: do you use recursion in your program? Recursive calls can eat through the stack very quickly.
Edit: have a look at this: (emphasis mine)
On Windows, the stack space to
reserved for the program is set using
the /Fn compiler option, where n is
the number of bytes. Additionally,
the stack reserve size can be
specified through the Visual Studio
IDE which adds the Microsoft Linker
option /STACK: to the linker command
line. To set this, go to Property
Properties>Linker>System>Stack Reserve
Size. There you can specify the stack
size in bytes in either decimal or
C-language notation. If not specified,
the default stack size is 1MB.
The only problem I ran into with a similar test code, is the 2Gb allocation limit for 32-bit compilation. When I exceed it I get an error message on line 419 in winsig.c
Here is the test code
program FortranCon
implicit none
! Variables
INTEGER :: IA(64), S1
REAL(4) :: S2
AA(1:N,1:N) = 1D0
BB(1:N,1:N) = 2D0
S1 = SIZEOF(AA) !Size of each array
S2 = 2*DBLE(S1)/1024/1024 !Total size for 2 arrays in Mb
WRITE (*,100) S2, ' Mb' ! When allocation reached 2Gb then
100 FORMAT (F8.1,A) ! exception occurs in Win32
end program FortranCon
... !Do stuff with AA,BB
When N=10960 it runs ok showing 1832.9 Mb. With N=11960 it crashes. Of course when I compile with x64 it works ok. Each array has 8*N^2 bytes storage. I don't know if it helps but I recommend using the INTENT() keywords for the dummy variables.
Are you using some parallelization? This can be a problem with statically declared arrays. Try all bigger arrays make ALLOCATABLE, otherwise, they will be placed on the stack in autoparallel or OpenMP threads.
For me the issue was the stack reserve size. I went and changed the stack reserved size from 0 to 100000000 and recompiled the code. The code now runs smoothly.