Fortran script only runs when print statement added - fortran

I am running an atmospheric model, and need to compile an executable to convert some files. If I compile the code as supplied, it runs but it gets stuck and doesn't ever complete. It doesn't give an error or anything like that.
After doing some testing by adding print statements to see where it was getting stuck, I've found that the executable only runs if I compile the code with a print statement in one of the subroutines.
The piece of code in question is the one here. Specifically, the code fails to run unless I put a print statement somewhere in the get_bottom_top_dim subroutine.
Does anyone know why this might be? It doesn't matter what the print statement is (currently I'm using print*, '!'). but as soon as I remove it or comment it out, the code no longer works.
I'm assuming it must have something to do with my machine or compiler (ifort 12.1.0), but I'm stumped as to what the problem is!

This is an extended comment rather than an answer:
The situation you describe, inserting a print statement which apparently fixes a program, often arises when the underlying problem is due to either
a) an attempt to access an element outside the declared bounds of an array; or
b) a mismatch between dummy and actual arguments to some procedure.
Recompile your program with the compiler options to check interfaces at compile-time and to check array bounds at run-time.

Fortran has evolved a LOT since I last used it but here's how to go about solving your problem.
Think of some hypotheses that could explain the symptoms, e.g. the compiler is optimizing the subroutine down to a no-op when it has no print side effect. Or a compiler bug is translating this code into something empty or an infinite loop or crashing code. (What exactly do you mean by "fails to run"?) Or the Linker is failing to link in some needed code unless the subroutine explicitly calls print.
Or there's a bug in this subroutine and the print statement alters its symptoms e.g. by changing which data gets overwritten by an index-out-of-bounds bug.
Think of ways to test these hypotheses. You might already have observations adequate to rule out of some of them. You could decompile the object code to see if this subroutine is empty. Or step through it in a debugger. Or replace the print statement with a different side effect like logging to a file or to an in-memory text buffer.
Or turn on all optional runtime memory checks and compile time warnings. Or simplify the code until the problem goes away, then binary search on bringing back code until the problem recurs.
Do the most likely or easiest tests first. Rule out some hypotheses, and iterate.

I had a similar bug and I found that the problem was in the dependencies on the makefile.
This was what I had:
I set a variable with a value and the program stops.
I write a print command and it works.
I delete the print statement and continues to work.
I alter the variable value and stops.
The thing is, the variable value is set in a parameters.f90
The print statement is in a file H3.f90 that depends on parameters.f90 but it was not declared on the makefile.
After correcting:
h3.o: h3.f90 variables.f90 parameters.f90
$(FC) -c h3.f90
It all worked properly.

Related

Dummy call makes a difference in Fortran program?

I'm working on a Fortran program and running into a strange bug with some Heisenbug-type characteristics, and looking for some insight into what might be going on. The code is too large to post in full but I hopefully I can the general idea.
What's basically going on is I have a subroutine that reads a list of numerical parameters from a text file,
call read_parameters(filename, parameter_array)
and then this list of parameters is sent into another subroutine that runs a program using those parameter values.
call run_program(parameter_array)
These calls are part of a loop that calls run_program with slightly different parameters each time through the loop---the intention is to find better parameter sets.
I've found that on the first pass through this loop, run_program gives bizarre results, which seems to indicate that something is going wrong with the first call to read_parameters. But all the subsequent passes behave normally and I haven't been able to understand what's going wrong with that first pass despite a lot of investigating, including for example printing the values of the parameters themselves within the actual run_program code.
While testing, I realized that if I put another call to read_parameters right above the call to run_program, then the first pass of the program runs normally, but here's the thing: this new call to read_parameters is just a dummy call, with an output array parameter_array2 that doesn't even get used! As in,
call read_parameters(parameter_array)
call read_parameters(parameter_array2)
call run_program(parameter_array)
If the second line is present, the program runs just fine, even though parameter_array2 isn't used anywhere, while if it's absent the program gives erroneous results for the first pass through the loop.
Does anyone have any ideas about what might be going on?
Thanks.

Output of program depends on arbitrary print statements?

I have a Fortran 95 code whose output seems to be a function of things that it shouldn't be a function of. Specifically, the following scenerio is happening:
Run code with version A; it doesn't work (I mean, it works as in it compiles and runs, but it doesn't give the result I expect)
Run code with Version B; it works. Version B contains only trivial modifications to version A such as print statements or small changes in numerical values of variables.
Run code with version A; all of a sudden, it works.
I think there's some issue with memory or using variables before they're initialized, so I was wondering whether or not there was a way to check this sort of thing with gfortran, or if any one knows what the problem might be. I've tried gfortran my_program.f95 -Wall - Wextra, but it just gives me a bunch of complaints about nonconforming tab characters.
This was a while ago, but I fixed the problem so I figured I might as well post it. To be honest, I'm not sure whether or not these steps in particular are what fixed it, but it works, so here they are:
Put all procedures in modules (this also helps to organize the code) as opposed to just "out in the open."
Declare the intent (in, out or inout) of all variables via real, intent(in) :: foo. This is obviously useful for optimization and organization but apparently it has something to do with interfaces as well ... no idea what that's about.
And that's it!

How to interpret a GDB backtrace?

0x004069f1 in Space::setPosition (this=0x77733cee, x=-65, y=-49) at space.h:44
0x00402679 in Checkers::make_move (this=0x28cbb8, move=...) at checkers.cc:351
0x00403fd2 in main_savitch_14::game::make_computer_move (this=0x28cbb8) at game.cc:153
0x00403b70 in main_savitch_14::game::play (this=0x28cbb8) at game.cc:33
0x004015fb in _fu0___ZSt4cout () at checkers.cc:96
0x004042a7 in main () at main.cc:34
Hello, I am coding a game for a class and I am running into a segfault. The checker pieces are held in a two dimensional array, so the offending bit appears to be invalid x/y for the array. The moves are passed as strings, which are converted to integers, thus for the x and y were somehow ASCII NULL. I noticed that in the function call make_move it says move=...
Why does it say move=...? Also, any other quick tips of solving a segfault? I am kind of new to GDB.
Basically, the backtrace is trace of the calls that lead to the crash. In this case:
game::play called game::make_computer_move which called Checkers::make_move which called Space::setPosition which crashed in line 44 in file space.h.
Taking a look at this backtrace, it looks like you passed -65 and -49 to Space::setPosition, which if they happen to be invalid coordinates (sure look suspicious to me being negative and all). Then you should look in the calling functions to see why they have the values that they do and correct them.
I would suggest using assert liberally in the code to enforce contracts, pretty much any time you can say "this parameter or variable should only have values which meet certain criteria", then you should assert that it is the case.
A common example is if I have a function which takes a pointer (or more likely smart pointer) which is not allowed to be NULL. I'll have the first line of the function assert(p);. If a NULL pointer is ever passed, I know right away and can investigate.
Finally, run the application in gdb, when it crashes. Type up to inspect the calling stack frame and see what the variables looked like: (you can usually write things like print x in the console). likewise, down will move down the call stack if you need to as well.
As for SEGFAULT, I would recommend runnning the application in valgrind. If you compile with debugging information -g, then it often can tell you the line of code that is causing the error (and can even catch errors that for unfortunate reasons don't crash right away).
I am not allowed to comment, but just wanted to reply for anyone looking more recently on the issue trying to find where the variables become (-65, -49). If you are getting a segfault you can get a core dump. Here is a pretty good source for making sure you can set up gdb to get a core dump. Then you can open your core file with gdb:
gdb -c myCoreFile
Then set a breakpoint on your function call you'd like to step into:
b MyClass::myFunctionCall
Then step through with next or step to maneuver through lines of code:
step
or
next
When you are at a place in your code that you'd like to evaluate a variable you can print it:
p myVariable
or you can print all arguments:
info args
I hope this helps someone else looking to debug!

stack problem

I got a working program compiled with gcc 3.44 but when I compiled it again using 4.44 there's something wrong. Some of the local variables in a function seems to be modified by unknown so that a for loop will not terminate because variable in its condition is constantly changing to 0 even if it's incremented. Calling a function under the loop seems to be okay because it returned to a correct address. I tried tracing the value of the variable in which the loop is affected, I found out the the value is modified after calling a print function under an if branch, removing or adding more print call solves it but I think it has nothing to do with the print function and there's no code that modify that variable except only the increment in the loop. I also tried tracing esp at the beginning and end of the loop, it is the same. What could have caused the problem?
You stated that you're going from GCC v3.44 (where the code works) to v4.44 where the code is broken.
Make sure that all other parts of the program (all source files and library files) are also compiled with GCC v4.44. You're calling a print function, so I'm guessing you're referring to the standard printf function in glibc. So make sure that glibc is also compiled under v4.44.
If this is really a problem with your print functions, maybe you are corrupting the stack with some of the parameters of the variadic list? Maybe an assumption that you had about one of the standard data types or enumeration constants doesn't hold any more? Are these your own print functions? Then try to use the __attribute__ extension of gcc to have compile time type checks.

Visual Studio 2005 C compiler problem when optimizing a switch statement

General Question which may be of interest to others:
I ran into a, what I believe, C++-compiler optimization (Visual Studio 2005) problem with a switch statement. What I'd want to know is if there is any way to satisfy my curiosity and find out what the compiler is trying to but failing to do. Is there any log I can spend some time (probably too much time) deciphering?
My specific problem for those curious enough to continue reading - I'd like to hear your thoughts on why I get problems in this specific case.
I've got a tiny program with about 500 lines of code containing a switch statement. Some of its cases contain some assignment of pointers.
double *ptx, *pty, *ptz;
double **ppt = new double*[3];
//some code initializing etc ptx, pty and ptz
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
The middle statement seems to hang the compiler. The compilation never ends. OK, I didn't wait for longer than it took to walk down the hall, talk to some people, get a cup of coffee and return to my desk, but this is a tiny program which usually compiles in less than a second. Remove a single line (the one indicated in the code above) and the problem goes away, as it also does when removing the optimization (on the whole program or using #pragma on the function).
Why does this middle line cause a problem? The compilers optimizer doesn't like pty.
There is no difference in the vectors ptx, pty, and ptz in the program. Everything I do to pty I do to ptx and ptz. I tried swapping their positions in ppt, but pty was still the line causing a problem.
I'm asking about this because I'm curious about what is happening. The code is rewritten and is working fine.
Edit:
Almost two weeks later, I check out the closest version to the code I described above and I can't edit it back to make it crash. This is really annoying, embarrassing and irritating. I'll give it another try, but if I don't get it breaking anytime soon I guess this part of the question is obsolete and I'll remove it. Really sorry for taking your time.
If you need to make this code compilable without changing it too much consider using memcpy where you assign a value to ppt[1]. This should at least compile fine.
However, you problem seems more like another part of the source code causes this behaviour.
What you can also try is to put this stuff:
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
in another function.
This should also help compiler a bit to avoid the path it is taking to compile your code.
Did you try renaming pty to something else (i.e. pt_y)? I encountered a couple of times (i.e. with a variable "rect2") the problem that some names seem to be "reserved".
It sounds like a compiler bug. Have you tried re-ordering the lines? e.g.,
ppt[1]=pty;
ppt[0]=ptx;
ppt[2]=ptz;
Also what happens if you juggle about the values that are assigned (which will introduce bugs in your code, but may indicator whether its the pointer or the array that's the issue), e.g.:
ppt[0] = pty;
ppt[1] = ptz;
ppt[2] = ptx;
(or similar).
It's probably due to your declaration of ptx, pty and ptz with them being optimised out to use the same address. Then this action is causing your compiler problems later in your code.
Try
static double *ptx;
static double *pty;
static double *ptz;