I've having trouble with some old code used for research that I would like to compile using the Intel Fortran compiler. In a particular subroutine, I get segmentation faults unless I add in a write statement that just outputs the value of the loop index.
do j=1,ne
SOME STUFF
write(*,*) 'j=', j
end
What could be causing my error such that this write statement would fix my segmentation fault? (Note: j is declared as an integer)
thanks,
keely
Classic ways of causing this type of error which is 'fixed' by inserting write statements:
walking off the end of an array -- use your compiler to switch on bounds-checking and debugging options to check for this;
disagreement between arguments provided to a sub-program and arguments expected. Again, use your compiler if possible, your eyes otherwise.
Odds are 5-to-1 that one of these is the cause.
Related
I have the code kind of like this in F90:
real(8), dimension(10,10,10) :: A
do i = 1, 1000
print*,A(i,1,1)
enddo
I'm very surprised this worked and it's faster than simply looping over 3 dimensions by i,j,k.
Can someone please explain why this works?
Your code is illegal. But under the hood the memory layout of the array happens to be in the column major order as in the triple loop k,j,i, so the code appears to work. But it is illegal.
If you enable runtime error checks in your compiler (see the manual), it will find the error and report it.
It may be slightly faster, if you do not enable compiler optimizations, because there is some overhead in nested loops, but an optimizing compiler will optimize the code to one loop.
If you actually did (you should always show your code!!!)
do i=1,10
do j=1,10
do k=1,10
something with A(i,j,k)
then please note that that is the wrong order and you should loop in the k,j,i order.
Also please note that measuring the speed of printing to the screen is not useful and can be very tricky. Some mathematical operations are more useful.
So i have a program, which has something like this in it:
integer :: mgvn, stot, gutot, iprint, iwrit, ifail, iprnt
...
call readbh(lubnd,nbset,nchan,mgvn,stot,gutot,nstat,nbound,rr,bform,iprnt,iwrit,ifail)
And then inside readbh:
CALL GETSET(LUBND,NSET,KEYBC,BFORM,IFAIL)
IF(IFAIL.NE.0) GO TO 99
...
99 WRITE(IWRITE,98) NBSET,LUBND
IFAIL = 1
RETURN
Where all the other variables are defined, but ifail is not. If i add in write(*,*) ifail before the function call, i get the undefined variable error, but if i leave it out, it doesn't complain, and just runs away with the function, and always fails, with IFAIL=1.
Is this because it's just getting to the end of the arguments in the readbh function, reading in uninitialised memory - which is just random jibberish - and then casting those bits to an int - which is not going to be zero unless i'm very (un)lucky, and so nearly always making ifail.ne.0 true?
I'll choose to interpret what you call undefined variable as uninitialised variable. Generally speaking Fortran, and many other compiled programming languages, will quite happily carry on computing with uninitialised variables. It/they are programming languages for grown-ups, it's on our own head if you program this sort of behaviour. It is not syntactically incorrect to write a Fortran program which uses uninitialised variables so a compiler is not bound by the language standard to raise a warning or error.
Fortran does, though, have the facility for you to program functions and subroutines to ensure that output arguments are given values. If you use the intent(out) attribute on arguments which ought to have values assigned to them inside a procedure, then the compiler will check that an assignment is made and raise an error if one is not.
Most compilers have an option to implement run-time checking for use of uninitialised variables. Intel Fortran, for example, has the flag -check:uninit. Without this check, yes, your program will interpret whatever pattern of bits it finds in the region of memory labelled ifail as an integer and carry on.
You write that your function always fails with ifail == 1. From what you've shown us ifail is, just prior to the return at (presumably) the end of the call to readbh, unconditionally set to 1.
From what you've revealed of your code it looks to me as if ifail is intended as an error return code from getset so it's not necessarily wrong that it is uninitialised on entry to that subroutine. But it is a little puzzling that readbh then sets it to 1 before returning.
I'm trying to get a legacy FORTRAN code working by building it from source using gfortran. I have finally been able to build it successfully, but now I'm getting an out-of-bounds error when it runs. I used gdb and traced the error to a function that uses the loc() intrinsic. When I try to print the value of loc(ae), with ae being my integer value being passed, I get the error "No symbol "loc" in current context." I tried compiling with ifort 11.x and debugged with DDT and got the same error. To me, this means that the compiler knows nothing of the intrinsic.
A little reading revealed that the loc intrinsic wasn't part of the F77 standard, so maybe that's part of the problem. I posted the definition of the intrinsic below, but I don't know how I can implement that into my code so loc() can be used.
Any advice or am I misinterpreting my problem? Because both gfortran and ifort crash in the same place due to an out of bounds error, but the function utilizing loc() returns the same large number between both compilers. It seems a bit strange that loc() wouldn't be working if both compilers shoot back the same value for loc.
Usage:
iaddr = loc(obj)
Where:
obj
is a variable, array, function or subroutine whose address is wanted.
iaddr
is an integer with the address of "obj". The address is in the same
format as stored by an LARn
instruction.
Description:
LOC is used to obtain the address of
something. The value returned is not
really useful within Fortran, but may
be needed for GMAP subroutines, or
very special debugging.
Well, no, the fact that it compiles means that loc is known by the compiler; the fact that gdb doesn't know about it just means it's just not known by the debugger (which probably doesn't know the matmult intrinsic, either).
loc is a widely-available non-standard extension. I hate those. If you want something standard that should work everywhere, c_loc, which is part of the C<->Fortran interoperability standard in Fortran2003, is something you could use. It returns a pointer that can be passed to C routines.
How is the value from the loc call being used?
Gfortran loc seems to work a bit differently with arrays to that of some other compilers. If you are using it to eg check for array copies or such then it can be better to do loc of the first element loc(obj(1,1)) or similar. This is equivalent to what loc does I think with intel, but in gfortran it gives instead some other address (so two arrays which share exactly the same memory layout have different loc results).
I'm trying to write a C++ program that utilizes a few tens of thousands of lines of Fortran 77 code, but running into some strange errors. I'm passing three coordinates (x,y,z) and the address of three vectors from C++ into fortran, then having fortran run some computations on the initial points and return results in the three vectors.
I do this a few hundred times in a C++ function, leave that function, and then come back to do it again. It works perfectly the first time through, but the second time through it stops returning useful results (returns nan) for points with a positive x component.
Initially it seems like an algorithm problem, except for three things:
It works perfectly the first 200 times I run it
It works if I call it from fortran and eliminate C++ altogether (not viable for the final program)
I've tried adding print statements to fortran to debug where it goes wrong, but turns out if I add print statments to a specific subroutine (even something as simple as PRINT *,'Here'), the program starts returning NaNs even on the first run.
This is why I think it's something to do with how memory is being allocated and deallocated between C and fortran function/subroutine calls. The basic setup looks like this:
C++:
void GetPoints(void);
extern"C"
{
void getfield_(float*,float*,float*,float[],float[],float[],int*,int*);
}
int main(void)
{
GetPoints(); //Works
GetPoints(); //Doesn't
}
void GetPoints(void)
{
float x,y,z;
int i,n,l;
l=50;
n=1;
x=y=z=0.0;
float xx[l],yy[l],zz[l]
for(i=0;i<l;i++)
getfield_(&x,&y,&z,xx,yy,zz,&n,&l);
//Store current xx,yy,zz in large global array
}
Fortran:
SUBROUTINE GETFIELD(XI,YI,ZI,XX,YY,ZZ,IIN,NP)
DIMENSION XX(NP),YY(NP),ZZ(NP)
EXTERNAL T89c
T89c(XI,YI,ZI,XX,YY,ZZ)
RETURN
END
!In T89c.f
SUBROUTINE T89c(XI,YI,ZI,XX,YY,ZZ)
COMMON /STUFF/ ARRAY(100)
!Lots of calculations
!Calling ~20 other subroutines
RETURN
END
Do any of you see any glaring memory issues that I'm creating? Perhaps common blocks that fortran thinks exist but are really deallocated by C++? Without the ability to debug using print statements, nor the time to try to understand the few thousand lines of someone else's Fortran 77 code, I'm open to trying just about anything you all can suggest or think of.
I'm using g++ 4.5.1 for compiling the C++ code and final linking, and gfortran 4.5.1 for compiling the fortran code.
Thanks
**Edit:**
I've tracked the error down to some obscure piece of the code that was written before I was even born. It appears it's looking for some common variable that got removed in the updates over the years. I don't know why it only affected one dimension, nor why the bug was replicatable by adding a print statement, but I've eliminated it nonetheless. Thank you all for the help.
You may be running into the "off-by-one" error. Fortran arrays are 1-based, while C arrays are 0-based. Make sure the array sizes you pass into Fortran are not 1 less than they should be.
Edit:
I guess it looks right... Still, I would try allocating 51 elements in the C++ function, just to see what happens.
By the way float xx[l]; is not standard. This is a gcc feature. Normally you should be allocating memory with new here, or you should be using std::vector.
Also, I am confused by the call to getfield_ in the loop. Shouldn't you be passing i to getfield_?
You should declare XX, YY and ZZ as arrays also in the subroutine T89c as follows:
REAL*4 XX(*)
REAL*4 YY(*)
REAL*4 ZZ(*)
C/C++ should in general never deallocate any Fortran common blocks. These are like structs in C (i.e. memory is reserved at compile time, not at runtime).
For some reason, gfortran seems to accept the following in T89c even without the above declarations:
print *,XX(1)
during compilation but when executing it I get a segmentation fault.
I've having trouble with some old code used for research that I would like to compile using the Intel Fortran compiler. In a particular subroutine, I get segmentation faults unless I add in a write statement that just outputs the value of the loop index.
do j=1,ne
SOME STUFF
write(*,*) 'j=', j
end
What could be causing my error such that this write statement would fix my segmentation fault? (Note: j is declared as an integer)
thanks,
keely
Classic ways of causing this type of error which is 'fixed' by inserting write statements:
walking off the end of an array -- use your compiler to switch on bounds-checking and debugging options to check for this;
disagreement between arguments provided to a sub-program and arguments expected. Again, use your compiler if possible, your eyes otherwise.
Odds are 5-to-1 that one of these is the cause.