I use command_argument_COUNT() to return number of command line arguments; however, such function always return -1, which is supposed to be a non-negative value?
What are the possible causes for -1?
I use gfortran v6.1.0.
Edit 1:
Since this program is linked with many other subroutines, a whole test code is difficult to be provided here.
I also tried to test command_argument_COUNT() solely, which worked as supposed.
The issue occurred when I linked it with some other subroutines.
So, at the moment, I'm guessing if some old pieces (in F77) are causing the issues. Would that be possible?
Edit 2:
Just resolved this issue, which turned out to be the conflicts in different netcdf libraries of my system. When I cleaned them and re-set netcdf up, such issues were gone.
Related
When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.
Pretty simple setup, using gfortran 4.8.5 on linux (red hat):
I get a segfault if my array of reals (inside a derived type) has size > 2,000,000. This seems to be a standard stack/heap issue as my stack size is 8mb if I check with ulimit.
There is no problem if the array is NOT inside a derived type
Note that as #francescalus guesses, removing the initial value = 0.0 eliminates the problem
Edit to add: Note that I have posted a followup question Segmentation fault related to component of derived type that represents a more realistic use case and further narrows down the conditions under which this seems to occur.
program main
call sub1 ! seg fault if col size > 2,100,000
call sub2 ! works fine at col size = 100,000,000
end program main
subroutine sub1
type table
real :: col(2100000) = 0.0 ! works if "= 0.0" removed
end type table
type(table) :: table1
table1%col = 1.0
end subroutine sub1
subroutine sub2
real :: col(100000000) = 0.0
col = 1.0
end subroutine sub2
Some obvious questions here:
Is this expected behavior, or some bug that was fixed in newer versions of gfortran?
Am I following standard fortran operating procedures here, or doing something wrong?
What is the recommended way to avoid this (please assume that I am unable to update to a newer version of gfortran in the near term)? I will almost certainly solve with an allocatable array component for reasons not specific to this question, but that might not be an ideal general solution and I would like to know of all good options I have here.
In particular, is initializing the components of a derived type bad practice?
This is likely to be a runtime issue due to insufficient stack, rather than a bug with gfortran.
Gfortran uses the stack to store automatic arrays and other initialization data. When code does not create problems when one such array is small, but segfaults when the size of the array increases, a possible reason is running out of stack.
The issue seems to be the same in more recent versions of gfortran. I compiled and ran your program with gfortran 4.8.4, 4.9.3, 5.5.0, 6.4.0, 7.3.0 and 8.2.0. In all cases I obtained a segmentation fault with the default stack size, but no error when the stack size was slightly increased.
$ ./sfa
Segmentation fault
$ ulimit -s
8192
$ ulimit -s 8256
$ ./sfa && echo "DONE"
DONE
Your problem may be solved by running
$ ulimit -s unlimited
before executing your binary. I am not aware of any particular penalty for doing this, but programmers more aware of the fine details of memory management, such as compiler developers, may think otherwise.
Initializing the components of a derived type is not bad practice, but as you can see, it can create problems with the stack if the component is a big array - be it due to the storage of the component itself, or to the storage of memory to work on the RHS of the assignment. If the component is made allocatable and allocated in a subroutine, the array is stored in the heap rather than in the stack, and this issue is usually avoided. In this case, it may be about actually setting the values of the array dynamically in a subroutine rather than at compile time. It may be less elegant, but I think it's worth it, since it's the typical example of code development work that prevents avoidable, environment-related errors when executing the binary.
Your code above is standards compliant. As explained in the comments, lack of explicit interfaces for subroutines is not good practice, but for these simple subroutines it's not against the rules.
Some compilers have flags that allow you to change where some objects are allocated in memory. While it may fix a particular issue, flags are compiler dependent, and usually not equivalent when comparing different compilers. Using dynamic memory via allocatables is a more robust solution, according to my experience.
Finally, note that, if you are using OpenMP, the ulimit command above only affects the master thread - you need to set the stack size of each of the other threads via the environment variable OMP_STACKSIZE, which cannot be unlimited. And bear in mind that non-master threads running out of stack are a problem much more difficult to diagnose, since the binary may stop without a proper Segmentation fault error.
These are not necessarily useful solutions, but below are some conditions under which the seg fault disappears. A couple of people mentioned the lack of an explicit interface (as bad practice though not technically incorrect), and it seems that this might be one key here as either of these two changes to the code gets rid of the seg fault, although it's not quite that simple, as I'll explain:
Put everything in main, with no subroutine calls
Put the type definition table in a module
Let me expand on #2 briefly. Simply taking the example in the OP and then giving it an explicit interface by putting the subroutine in a module does NOT work. However, if I put the type definition in a module and then use it (as shown below) the segfault does not occur:
program main
use table_mod
type(table) :: table1
table1%col = 1.0
end program main
I have downloaded the following fortran program dragon.f at http://www.iamg.org/documents/oldftp/VOL32/v32-10-11.zip
I need to do a minor modification to the program which requires the program to be translated to fortran90 (see below to confirm if this is truly needed).
I have managed to do this (translation only) by three different methods:
replacing comment line indicators (c for !) and line continuation
indicators (* in column 6 for & at the end of last line)
using convert.f90 (see https ://wwwasdoc.web.cern.ch/wwwasdoc/WWW/f90/convert.f90)
using f2f.pl (see https :// bitbucket.org/lemonlab/f2f/downloads)
Both 1) and 3) worked (i.e. managed to compile program) while 2) didn't work straight away.
However, after testing the program I found that the results are different.
With the fortran77 program, I get the "expected" results for the example provided with the program (the program comes with an example data "grdata.txt", and its example output "flm.txt" and "check.txt"). However, after running the translated (fortran90) program the results I get are different.
I suspect there are some issues with the way some variables are declared.
Can you give me recommendations in how to properly translate this program so I get the exact same results?
The reason I need to do it in fortran90 is because I need to input the parameters via a text file instead of modifying the program. This shouldnt be an issue for most of the parameters involved, except for the declaration of the last one, in which the size is determined from parameters that the program does not know a priori (see below):
implicit double precision(a-h,o-z)
parameter(lmax=90,imax=45,jmax=30)
parameter(dcta=4.0d0,dfai=4.0d0)
parameter(thetaa=0.d0,thetab=180.d0,phaia=0.d0,phaib=120.d0)
dimension f(0:imax,0:jmax),coe(imax,jmax,4),coew(4),fw(4)
So for example, I will read lmax, imax, jmax, dcta, dfai, thetaa, thetab, phaia, and phaib and the program needs to declare f and coe but as far as I read after googling this issue, they cannot be declared with an unknown size in fortran77.
Edit: This was my attempt to do this modification:
character fname1*100
call getarg(1,fname1)
open(10,file=fname1)
read(10,*)lmax,imax,jmax,dcta,dfai,thetaa,thetab,phaia,phaib
close(10)
So the program will read these constants from a file (e.g. params.txt), where the name of the file is supplied as an argument when invoking the program. The problem when I do this is that I do not know how to modify the line
dimension f(0:imax,0:jmax)...
in order to declare this array when the values imax and jmax are not known when compiling the program (they depend on the size of the data that the user will use).
As has been pointed out in the comments above, parameters cannot be read from file since they are set at compile time. Read them in as integer, declare the arrays as allocatable, and then allocate.
integer imax,jmax
real(8), allocatable :: f(:,:),coe(:,:,:)
read(10,*) imax,jmax
allocate(f(0:imax,0:jmax),coe(imax,jmax,4))
I found out that the differences in the results were attributed to using different compilers.
PS I ended up adding a lot more code than I intended at the beginning to allow reading data from netcdf files. This program in particular is really helpful for spherical harmonic expansion. [tag:spherical harmonics]
I've converted some legacy Fortran code to C using the f2c converter (f2c), and I've created a Visual Studio 10 solution on Windows 7 (64-bit). I've also had to link my C++ program (test.cpp, containing my main function) with the f2c library (built on my system using nmake).
The program runs, but once the end of the main function is reached, I receive the following Debug error:
Stack around the variable 'qq' was corrupted
Stack around the variable 'pf' was corrupted
Stack around the variable 'ampls' was corrupted
I am wondering if this might be due to a "correction" made by the f2c converter in the converted C (from Fortran) file:
/* Parameter adjustments */
--x1;
--xabs;
--ximag;
--xreal;
--work4;
--work3;
--work2;
--work1;
--ampls;
--pf;
--qq;
--tri;
This seems a bit odd, since all of these variables are C arrays, and I think that the f2c program is simply doing some pointer arithmetic so that index 0 in the array becomes index 1, in a similar fashion to Fortran.
I don't know if this could also be due to something going wrong with the converted code accessing an element of the array that has not been allocated.
What is the best way to debug this error and fix it?
Potential reasons:
This error is usually related to writing outside the bounds of an array (dynamic or static array). This error can occur by writing\getting a value in a -ve index or index >= size_of_array.
This error also accurs if your pointer is not set to its correct location. (e.g. ptr = 0, ptr = 55, points to deleted (released; or has been free) memory, or any invalid address)
Best way to debug your error in my mind is to debug your prorgam step-by-step and watch those pointer values. There must be some wrong with them.
What you say could be true. I would suggest to create a very small program that uses an array and decrements the pointer exactly as f2c does. Something like
int aa[10];
int *pa = aa;
--pa;
pa[1] = ...
That is, test the suspected code in the small scale. You might isolate the cause to the problem this way. (Finding a workaround is a different story)
Are you compiling with the debug versions of the crt? That might give you some more information.
Also, is it possible that your library is built as C and your application is written as C++?
Those errors you mention are sometimes because of different calling conventions. You do state it's a 64bit application, so it shouldn't be an issue (all 64bit apps use the same calling convention), but it's worth looking into.
Is it possible to add all the fortran converted code to visual studio and not do a seperate make?
I'm trying to get a legacy FORTRAN code working by building it from source using gfortran. I have finally been able to build it successfully, but now I'm getting an out-of-bounds error when it runs. I used gdb and traced the error to a function that uses the loc() intrinsic. When I try to print the value of loc(ae), with ae being my integer value being passed, I get the error "No symbol "loc" in current context." I tried compiling with ifort 11.x and debugged with DDT and got the same error. To me, this means that the compiler knows nothing of the intrinsic.
A little reading revealed that the loc intrinsic wasn't part of the F77 standard, so maybe that's part of the problem. I posted the definition of the intrinsic below, but I don't know how I can implement that into my code so loc() can be used.
Any advice or am I misinterpreting my problem? Because both gfortran and ifort crash in the same place due to an out of bounds error, but the function utilizing loc() returns the same large number between both compilers. It seems a bit strange that loc() wouldn't be working if both compilers shoot back the same value for loc.
Usage:
iaddr = loc(obj)
Where:
obj
is a variable, array, function or subroutine whose address is wanted.
iaddr
is an integer with the address of "obj". The address is in the same
format as stored by an LARn
instruction.
Description:
LOC is used to obtain the address of
something. The value returned is not
really useful within Fortran, but may
be needed for GMAP subroutines, or
very special debugging.
Well, no, the fact that it compiles means that loc is known by the compiler; the fact that gdb doesn't know about it just means it's just not known by the debugger (which probably doesn't know the matmult intrinsic, either).
loc is a widely-available non-standard extension. I hate those. If you want something standard that should work everywhere, c_loc, which is part of the C<->Fortran interoperability standard in Fortran2003, is something you could use. It returns a pointer that can be passed to C routines.
How is the value from the loc call being used?
Gfortran loc seems to work a bit differently with arrays to that of some other compilers. If you are using it to eg check for array copies or such then it can be better to do loc of the first element loc(obj(1,1)) or similar. This is equivalent to what loc does I think with intel, but in gfortran it gives instead some other address (so two arrays which share exactly the same memory layout have different loc results).