I am currently writing an eigenvalue algorithm in Fortran. I am just trying to get some insight on the cause of the issue I was having. I traced the issue but I want to know how the problems are related.
Specifically, I was making a series of calls to LAPACK as follows
call DGEMV('T', ROWS, COLUMNS, 1.0_dp, updates(j,k), LEADING_DIM, v, 1, 0.0_dp, w, 1)
call DGER(ROWS, COLUMNS, -2.0_dp, v, 1, w, 1, updates(j,k), LEADING_DIM)
The problem was that my indices j and k for which to begin the submatrix operation were incorrect. After the above code executed, there was no error-even with bounds checking turned on. However, a completely unrelated variable that was properly passed as 'intent(in)' was changed instead. After correcting the indices the problem no longer occurred.
When you access arrays out of bounds anything can occur. You write to some unknown part of memory, which can trigger other random errors.
The program is not standard conforming and its behaviour is undefined. You can't expect anything.
Related
I've having trouble with some old code used for research that I would like to compile using the Intel Fortran compiler. In a particular subroutine, I get segmentation faults unless I add in a write statement that just outputs the value of the loop index.
do j=1,ne
SOME STUFF
write(*,*) 'j=', j
end
What could be causing my error such that this write statement would fix my segmentation fault? (Note: j is declared as an integer)
thanks,
keely
Classic ways of causing this type of error which is 'fixed' by inserting write statements:
walking off the end of an array -- use your compiler to switch on bounds-checking and debugging options to check for this;
disagreement between arguments provided to a sub-program and arguments expected. Again, use your compiler if possible, your eyes otherwise.
Odds are 5-to-1 that one of these is the cause.
When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.
I have the code kind of like this in F90:
real(8), dimension(10,10,10) :: A
do i = 1, 1000
print*,A(i,1,1)
enddo
I'm very surprised this worked and it's faster than simply looping over 3 dimensions by i,j,k.
Can someone please explain why this works?
Your code is illegal. But under the hood the memory layout of the array happens to be in the column major order as in the triple loop k,j,i, so the code appears to work. But it is illegal.
If you enable runtime error checks in your compiler (see the manual), it will find the error and report it.
It may be slightly faster, if you do not enable compiler optimizations, because there is some overhead in nested loops, but an optimizing compiler will optimize the code to one loop.
If you actually did (you should always show your code!!!)
do i=1,10
do j=1,10
do k=1,10
something with A(i,j,k)
then please note that that is the wrong order and you should loop in the k,j,i order.
Also please note that measuring the speed of printing to the screen is not useful and can be very tricky. Some mathematical operations are more useful.
I am implementing a genetic algorithm to numerically estimate some coefficients in a system of ODEs based on experimental data. I am just learning Fortran along as I implement the algorithms. My config is a Intel Fortran 2015 running on Windows 7/Visual Studio 2013, on an old i7 processor.
I have the following piece among a multitude of lines of code:
DO i = 1, N_CROMOSOMES
IF (population(9,i) < 0.0_DOUBLE) population(9, i) = square_error(population(1:8, i))
END DO
Where I just defined DOUBLE to be:
INTEGER, PARAMETER :: DOUBLE = 16
N_CROMOSOMES is an INTEGER argument to the function, that defines the size of the array population, which in turn is a (9 x N_CROMOSOMES) array of type REAL(KIND=DOUBLE). For each column on this array, its first 8 elements represent the 8 coefficients that I am estimating, and the ninth element is the error associated with that particular 8 guesses for the coefficients. square_error is the function that determines it.
In this point of the program, I have marked columns that were just created or that were altered as having an error of -1. Hence, the "IF (population(9,i)<0.0_DOUBLE)": I am checking the columns of the array whose error is -1 in order to compute their error.
The thing is, I just rewrote most of my code, and spent the past few days correcting mysterious bugs. Before this, the code worked just fine with a FORALL instead of DO. Now it says gives an error "stack overflow" when I use FORALL, but works with DO. But it takes a lot more time to do its job.
Does anyone knows the cause of this, and also, how to solve it? It is clear to me that my code can highly benefit from paralellization, but I am not so sure how to do it.
Thanks for your time.
I'm trying to write a C++ program that utilizes a few tens of thousands of lines of Fortran 77 code, but running into some strange errors. I'm passing three coordinates (x,y,z) and the address of three vectors from C++ into fortran, then having fortran run some computations on the initial points and return results in the three vectors.
I do this a few hundred times in a C++ function, leave that function, and then come back to do it again. It works perfectly the first time through, but the second time through it stops returning useful results (returns nan) for points with a positive x component.
Initially it seems like an algorithm problem, except for three things:
It works perfectly the first 200 times I run it
It works if I call it from fortran and eliminate C++ altogether (not viable for the final program)
I've tried adding print statements to fortran to debug where it goes wrong, but turns out if I add print statments to a specific subroutine (even something as simple as PRINT *,'Here'), the program starts returning NaNs even on the first run.
This is why I think it's something to do with how memory is being allocated and deallocated between C and fortran function/subroutine calls. The basic setup looks like this:
C++:
void GetPoints(void);
extern"C"
{
void getfield_(float*,float*,float*,float[],float[],float[],int*,int*);
}
int main(void)
{
GetPoints(); //Works
GetPoints(); //Doesn't
}
void GetPoints(void)
{
float x,y,z;
int i,n,l;
l=50;
n=1;
x=y=z=0.0;
float xx[l],yy[l],zz[l]
for(i=0;i<l;i++)
getfield_(&x,&y,&z,xx,yy,zz,&n,&l);
//Store current xx,yy,zz in large global array
}
Fortran:
SUBROUTINE GETFIELD(XI,YI,ZI,XX,YY,ZZ,IIN,NP)
DIMENSION XX(NP),YY(NP),ZZ(NP)
EXTERNAL T89c
T89c(XI,YI,ZI,XX,YY,ZZ)
RETURN
END
!In T89c.f
SUBROUTINE T89c(XI,YI,ZI,XX,YY,ZZ)
COMMON /STUFF/ ARRAY(100)
!Lots of calculations
!Calling ~20 other subroutines
RETURN
END
Do any of you see any glaring memory issues that I'm creating? Perhaps common blocks that fortran thinks exist but are really deallocated by C++? Without the ability to debug using print statements, nor the time to try to understand the few thousand lines of someone else's Fortran 77 code, I'm open to trying just about anything you all can suggest or think of.
I'm using g++ 4.5.1 for compiling the C++ code and final linking, and gfortran 4.5.1 for compiling the fortran code.
Thanks
**Edit:**
I've tracked the error down to some obscure piece of the code that was written before I was even born. It appears it's looking for some common variable that got removed in the updates over the years. I don't know why it only affected one dimension, nor why the bug was replicatable by adding a print statement, but I've eliminated it nonetheless. Thank you all for the help.
You may be running into the "off-by-one" error. Fortran arrays are 1-based, while C arrays are 0-based. Make sure the array sizes you pass into Fortran are not 1 less than they should be.
Edit:
I guess it looks right... Still, I would try allocating 51 elements in the C++ function, just to see what happens.
By the way float xx[l]; is not standard. This is a gcc feature. Normally you should be allocating memory with new here, or you should be using std::vector.
Also, I am confused by the call to getfield_ in the loop. Shouldn't you be passing i to getfield_?
You should declare XX, YY and ZZ as arrays also in the subroutine T89c as follows:
REAL*4 XX(*)
REAL*4 YY(*)
REAL*4 ZZ(*)
C/C++ should in general never deallocate any Fortran common blocks. These are like structs in C (i.e. memory is reserved at compile time, not at runtime).
For some reason, gfortran seems to accept the following in T89c even without the above declarations:
print *,XX(1)
during compilation but when executing it I get a segmentation fault.