I'm trying to write a C++ program that utilizes a few tens of thousands of lines of Fortran 77 code, but running into some strange errors. I'm passing three coordinates (x,y,z) and the address of three vectors from C++ into fortran, then having fortran run some computations on the initial points and return results in the three vectors.
I do this a few hundred times in a C++ function, leave that function, and then come back to do it again. It works perfectly the first time through, but the second time through it stops returning useful results (returns nan) for points with a positive x component.
Initially it seems like an algorithm problem, except for three things:
It works perfectly the first 200 times I run it
It works if I call it from fortran and eliminate C++ altogether (not viable for the final program)
I've tried adding print statements to fortran to debug where it goes wrong, but turns out if I add print statments to a specific subroutine (even something as simple as PRINT *,'Here'), the program starts returning NaNs even on the first run.
This is why I think it's something to do with how memory is being allocated and deallocated between C and fortran function/subroutine calls. The basic setup looks like this:
C++:
void GetPoints(void);
extern"C"
{
void getfield_(float*,float*,float*,float[],float[],float[],int*,int*);
}
int main(void)
{
GetPoints(); //Works
GetPoints(); //Doesn't
}
void GetPoints(void)
{
float x,y,z;
int i,n,l;
l=50;
n=1;
x=y=z=0.0;
float xx[l],yy[l],zz[l]
for(i=0;i<l;i++)
getfield_(&x,&y,&z,xx,yy,zz,&n,&l);
//Store current xx,yy,zz in large global array
}
Fortran:
SUBROUTINE GETFIELD(XI,YI,ZI,XX,YY,ZZ,IIN,NP)
DIMENSION XX(NP),YY(NP),ZZ(NP)
EXTERNAL T89c
T89c(XI,YI,ZI,XX,YY,ZZ)
RETURN
END
!In T89c.f
SUBROUTINE T89c(XI,YI,ZI,XX,YY,ZZ)
COMMON /STUFF/ ARRAY(100)
!Lots of calculations
!Calling ~20 other subroutines
RETURN
END
Do any of you see any glaring memory issues that I'm creating? Perhaps common blocks that fortran thinks exist but are really deallocated by C++? Without the ability to debug using print statements, nor the time to try to understand the few thousand lines of someone else's Fortran 77 code, I'm open to trying just about anything you all can suggest or think of.
I'm using g++ 4.5.1 for compiling the C++ code and final linking, and gfortran 4.5.1 for compiling the fortran code.
Thanks
**Edit:**
I've tracked the error down to some obscure piece of the code that was written before I was even born. It appears it's looking for some common variable that got removed in the updates over the years. I don't know why it only affected one dimension, nor why the bug was replicatable by adding a print statement, but I've eliminated it nonetheless. Thank you all for the help.
You may be running into the "off-by-one" error. Fortran arrays are 1-based, while C arrays are 0-based. Make sure the array sizes you pass into Fortran are not 1 less than they should be.
Edit:
I guess it looks right... Still, I would try allocating 51 elements in the C++ function, just to see what happens.
By the way float xx[l]; is not standard. This is a gcc feature. Normally you should be allocating memory with new here, or you should be using std::vector.
Also, I am confused by the call to getfield_ in the loop. Shouldn't you be passing i to getfield_?
You should declare XX, YY and ZZ as arrays also in the subroutine T89c as follows:
REAL*4 XX(*)
REAL*4 YY(*)
REAL*4 ZZ(*)
C/C++ should in general never deallocate any Fortran common blocks. These are like structs in C (i.e. memory is reserved at compile time, not at runtime).
For some reason, gfortran seems to accept the following in T89c even without the above declarations:
print *,XX(1)
during compilation but when executing it I get a segmentation fault.
Related
I've having trouble with some old code used for research that I would like to compile using the Intel Fortran compiler. In a particular subroutine, I get segmentation faults unless I add in a write statement that just outputs the value of the loop index.
do j=1,ne
SOME STUFF
write(*,*) 'j=', j
end
What could be causing my error such that this write statement would fix my segmentation fault? (Note: j is declared as an integer)
thanks,
keely
Classic ways of causing this type of error which is 'fixed' by inserting write statements:
walking off the end of an array -- use your compiler to switch on bounds-checking and debugging options to check for this;
disagreement between arguments provided to a sub-program and arguments expected. Again, use your compiler if possible, your eyes otherwise.
Odds are 5-to-1 that one of these is the cause.
When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.
I have downloaded the following fortran program dragon.f at http://www.iamg.org/documents/oldftp/VOL32/v32-10-11.zip
I need to do a minor modification to the program which requires the program to be translated to fortran90 (see below to confirm if this is truly needed).
I have managed to do this (translation only) by three different methods:
replacing comment line indicators (c for !) and line continuation
indicators (* in column 6 for & at the end of last line)
using convert.f90 (see https ://wwwasdoc.web.cern.ch/wwwasdoc/WWW/f90/convert.f90)
using f2f.pl (see https :// bitbucket.org/lemonlab/f2f/downloads)
Both 1) and 3) worked (i.e. managed to compile program) while 2) didn't work straight away.
However, after testing the program I found that the results are different.
With the fortran77 program, I get the "expected" results for the example provided with the program (the program comes with an example data "grdata.txt", and its example output "flm.txt" and "check.txt"). However, after running the translated (fortran90) program the results I get are different.
I suspect there are some issues with the way some variables are declared.
Can you give me recommendations in how to properly translate this program so I get the exact same results?
The reason I need to do it in fortran90 is because I need to input the parameters via a text file instead of modifying the program. This shouldnt be an issue for most of the parameters involved, except for the declaration of the last one, in which the size is determined from parameters that the program does not know a priori (see below):
implicit double precision(a-h,o-z)
parameter(lmax=90,imax=45,jmax=30)
parameter(dcta=4.0d0,dfai=4.0d0)
parameter(thetaa=0.d0,thetab=180.d0,phaia=0.d0,phaib=120.d0)
dimension f(0:imax,0:jmax),coe(imax,jmax,4),coew(4),fw(4)
So for example, I will read lmax, imax, jmax, dcta, dfai, thetaa, thetab, phaia, and phaib and the program needs to declare f and coe but as far as I read after googling this issue, they cannot be declared with an unknown size in fortran77.
Edit: This was my attempt to do this modification:
character fname1*100
call getarg(1,fname1)
open(10,file=fname1)
read(10,*)lmax,imax,jmax,dcta,dfai,thetaa,thetab,phaia,phaib
close(10)
So the program will read these constants from a file (e.g. params.txt), where the name of the file is supplied as an argument when invoking the program. The problem when I do this is that I do not know how to modify the line
dimension f(0:imax,0:jmax)...
in order to declare this array when the values imax and jmax are not known when compiling the program (they depend on the size of the data that the user will use).
As has been pointed out in the comments above, parameters cannot be read from file since they are set at compile time. Read them in as integer, declare the arrays as allocatable, and then allocate.
integer imax,jmax
real(8), allocatable :: f(:,:),coe(:,:,:)
read(10,*) imax,jmax
allocate(f(0:imax,0:jmax),coe(imax,jmax,4))
I found out that the differences in the results were attributed to using different compilers.
PS I ended up adding a lot more code than I intended at the beginning to allow reading data from netcdf files. This program in particular is really helpful for spherical harmonic expansion. [tag:spherical harmonics]
I'm trying to get a legacy FORTRAN code working by building it from source using gfortran. I have finally been able to build it successfully, but now I'm getting an out-of-bounds error when it runs. I used gdb and traced the error to a function that uses the loc() intrinsic. When I try to print the value of loc(ae), with ae being my integer value being passed, I get the error "No symbol "loc" in current context." I tried compiling with ifort 11.x and debugged with DDT and got the same error. To me, this means that the compiler knows nothing of the intrinsic.
A little reading revealed that the loc intrinsic wasn't part of the F77 standard, so maybe that's part of the problem. I posted the definition of the intrinsic below, but I don't know how I can implement that into my code so loc() can be used.
Any advice or am I misinterpreting my problem? Because both gfortran and ifort crash in the same place due to an out of bounds error, but the function utilizing loc() returns the same large number between both compilers. It seems a bit strange that loc() wouldn't be working if both compilers shoot back the same value for loc.
Usage:
iaddr = loc(obj)
Where:
obj
is a variable, array, function or subroutine whose address is wanted.
iaddr
is an integer with the address of "obj". The address is in the same
format as stored by an LARn
instruction.
Description:
LOC is used to obtain the address of
something. The value returned is not
really useful within Fortran, but may
be needed for GMAP subroutines, or
very special debugging.
Well, no, the fact that it compiles means that loc is known by the compiler; the fact that gdb doesn't know about it just means it's just not known by the debugger (which probably doesn't know the matmult intrinsic, either).
loc is a widely-available non-standard extension. I hate those. If you want something standard that should work everywhere, c_loc, which is part of the C<->Fortran interoperability standard in Fortran2003, is something you could use. It returns a pointer that can be passed to C routines.
How is the value from the loc call being used?
Gfortran loc seems to work a bit differently with arrays to that of some other compilers. If you are using it to eg check for array copies or such then it can be better to do loc of the first element loc(obj(1,1)) or similar. This is equivalent to what loc does I think with intel, but in gfortran it gives instead some other address (so two arrays which share exactly the same memory layout have different loc results).
I've having trouble with some old code used for research that I would like to compile using the Intel Fortran compiler. In a particular subroutine, I get segmentation faults unless I add in a write statement that just outputs the value of the loop index.
do j=1,ne
SOME STUFF
write(*,*) 'j=', j
end
What could be causing my error such that this write statement would fix my segmentation fault? (Note: j is declared as an integer)
thanks,
keely
Classic ways of causing this type of error which is 'fixed' by inserting write statements:
walking off the end of an array -- use your compiler to switch on bounds-checking and debugging options to check for this;
disagreement between arguments provided to a sub-program and arguments expected. Again, use your compiler if possible, your eyes otherwise.
Odds are 5-to-1 that one of these is the cause.