Output of program depends on arbitrary print statements? - fortran

I have a Fortran 95 code whose output seems to be a function of things that it shouldn't be a function of. Specifically, the following scenerio is happening:
Run code with version A; it doesn't work (I mean, it works as in it compiles and runs, but it doesn't give the result I expect)
Run code with Version B; it works. Version B contains only trivial modifications to version A such as print statements or small changes in numerical values of variables.
Run code with version A; all of a sudden, it works.
I think there's some issue with memory or using variables before they're initialized, so I was wondering whether or not there was a way to check this sort of thing with gfortran, or if any one knows what the problem might be. I've tried gfortran my_program.f95 -Wall - Wextra, but it just gives me a bunch of complaints about nonconforming tab characters.

This was a while ago, but I fixed the problem so I figured I might as well post it. To be honest, I'm not sure whether or not these steps in particular are what fixed it, but it works, so here they are:
Put all procedures in modules (this also helps to organize the code) as opposed to just "out in the open."
Declare the intent (in, out or inout) of all variables via real, intent(in) :: foo. This is obviously useful for optimization and organization but apparently it has something to do with interfaces as well ... no idea what that's about.
And that's it!

Related

Fortran script only runs when print statement added

I am running an atmospheric model, and need to compile an executable to convert some files. If I compile the code as supplied, it runs but it gets stuck and doesn't ever complete. It doesn't give an error or anything like that.
After doing some testing by adding print statements to see where it was getting stuck, I've found that the executable only runs if I compile the code with a print statement in one of the subroutines.
The piece of code in question is the one here. Specifically, the code fails to run unless I put a print statement somewhere in the get_bottom_top_dim subroutine.
Does anyone know why this might be? It doesn't matter what the print statement is (currently I'm using print*, '!'). but as soon as I remove it or comment it out, the code no longer works.
I'm assuming it must have something to do with my machine or compiler (ifort 12.1.0), but I'm stumped as to what the problem is!
This is an extended comment rather than an answer:
The situation you describe, inserting a print statement which apparently fixes a program, often arises when the underlying problem is due to either
a) an attempt to access an element outside the declared bounds of an array; or
b) a mismatch between dummy and actual arguments to some procedure.
Recompile your program with the compiler options to check interfaces at compile-time and to check array bounds at run-time.
Fortran has evolved a LOT since I last used it but here's how to go about solving your problem.
Think of some hypotheses that could explain the symptoms, e.g. the compiler is optimizing the subroutine down to a no-op when it has no print side effect. Or a compiler bug is translating this code into something empty or an infinite loop or crashing code. (What exactly do you mean by "fails to run"?) Or the Linker is failing to link in some needed code unless the subroutine explicitly calls print.
Or there's a bug in this subroutine and the print statement alters its symptoms e.g. by changing which data gets overwritten by an index-out-of-bounds bug.
Think of ways to test these hypotheses. You might already have observations adequate to rule out of some of them. You could decompile the object code to see if this subroutine is empty. Or step through it in a debugger. Or replace the print statement with a different side effect like logging to a file or to an in-memory text buffer.
Or turn on all optional runtime memory checks and compile time warnings. Or simplify the code until the problem goes away, then binary search on bringing back code until the problem recurs.
Do the most likely or easiest tests first. Rule out some hypotheses, and iterate.
I had a similar bug and I found that the problem was in the dependencies on the makefile.
This was what I had:
I set a variable with a value and the program stops.
I write a print command and it works.
I delete the print statement and continues to work.
I alter the variable value and stops.
The thing is, the variable value is set in a parameters.f90
The print statement is in a file H3.f90 that depends on parameters.f90 but it was not declared on the makefile.
After correcting:
h3.o: h3.f90 variables.f90 parameters.f90
$(FC) -c h3.f90
It all worked properly.

Shedskin - Compile Error

I'm currently trying to compile a python project (5files # total 1200 lines of code) with shedskin.
I tried shedskin Version 0.9.3 and 0.9.2 both result in the same errors.
This is the first error I encounter:
mmain.cpp: In function ‘__shedskin__::list<__shedskin__::list<int>*>* __mmain__::list_comp_3(__shedskin__::__ss_int)’:
mmain.cpp:133: error: no matching function for call to ‘__shedskin__::list<__shedskin__::list<int>*>::append(__shedskin__::list<double>*)’
Moreover, I after running shedskin (i.e. before typing "make") I receive many warnings - all related to dynamic types:
*WARNING* mmain.py: expression has dynamic (sub)type: {float, int, list}
However, shedskin seems to work flawlessly with the provided examples since I can compile and execute them without any errors.
Do you have an idea where to look for the error or what the error is related to?
mmain.cpp:133: error: no matching function for call to ‘__shedskin__::list<__shedskin__::list<int>*>::append(__shedskin__::list<double>*)’
This error means that you've got a Python object that shedskin has inferred as a list of lists of ints, but now you're trying to append something that it's inferred as a list of floats. You can get that by, for example, doing this:
a = [[1], [2]]
b = 1.0
a.append([b])
However, from the line above it, the function name is list_comp_3. Unless you've actually named a function list_comp_3 (which you haven't), this is a list comprehension. So, you may be doing something like this:
a = [1, 2, 3.0]
b = [[i] for i in a]
You may be wondering why it let you get away with a but failed on b. Well, first of all, it probably didn't really let you get away with it, if you've got dozens of warnings you haven't dealt with. But second, as the documentation says:
Integers and floats can often be mixed, but it is better to avoid this where possible, as it may confuse Shed Skin:
a = [1.0]
a = 1 # wrong - use a float here, too
As for the warnings, they can mean anything from "you got away with it this time, but don't expect to always do so" to "an error is coming up related to this" to "this will compile, but to something less efficient than the original Python code rather than more" to "this will compile, but to something incorrect".
More generally, it sounds like your program just can't be statically typed by shedskin's inference engine. Without actually seeing your code, it's impossible to tell you what you're doing wrong, but if you re-read the Typing Restrictions and Python Subset Restrictions sections of the docs, that should give you ideas of what kinds of things are and aren't appropriate.
to avoid confusion, please note that both code snippets provided by 'abartert' compile and run fine when compiled separately (shedskin 0.9.3). my guess is also that the problem should disappear after resolving the dynamic typing warnings. if not, I'd be very interested in seeing the program you are trying to compile, or at least enough of it to reproduce the problem.
update: btw, as of 0.9.1 or so, shedskin should be smarter about int and float mixing. if it encounters something that would lead to broken or inefficient c++ code (because of necessary run-time conversion of sorts), it should now usually complain with an 'incompatible types' warning. so perhaps it's time to update this part of the documentation slightly for 0.9.3.

Same Program code with same compiler leads to different binaries

I have an issue with my code that has some very strange symptoms.
The code is compiled on my computer with the following versions:
a. GCC Version: 4.4.2
b. CMAKE verson: 2.8.7
c. QNX (operating system) version: 6.5.0
And the code has a segfault whilst freeing some memory and exiting from a function (not dying on any code, just on the exit from a function).
The weird things about this are:
The code does it in release mode but not debug mode:
a. The code is threaded so this indicates a race condition.
b. I cannot debug by putting it in debug mode.
The code when compiled on a workmates machine with the same versions of everything, does not have this problem.
a. The wierd things about this are that the workmates code works, but also that the binary created from compiling on his machine, which is the same, is about 6mB bigger.
Now annoyingly I cannot post the code because it is too big and also for work. But can anyone point me along a path to fixing this.
Since I am using QNX I am limited for my debug tools, I cannot use Valgrind and since it is not supported in QNX, GDB doesn't really help.
I am looking for anyone who has had a similar/same problem and what the cause was and how they fixed it.
EDIT:
Sooo... I found out what it was, but im still a bit confused about how it happened.
The culprit code was this:
Eigen::VectorXd msBb = data.modelSearcher->getMinimumBoundingBox();
where the definition for getMinimumBoundingBox is this:
Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();
and it returns a VectorXd which is always initialised as VectorXd output(6, 1). So I immediately thought, right it must be because the VectorXd is not being initialised, but changing it to this:
Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();
But this didn't work. In fact I had to fix it by changing the definition of the function to this:
void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
and the call to this
Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);
So now the new question:
What the hell? Why didn't the first change work but the second did, why do I have to pass by reference? Oh and the big question, how the hell didn't this break when my co-worker compiled it and I ran it? Its a straight out memory error, surely it shouldn't depend on which computer compiles it, especially since the compiler and all the other important things are the same!!??
Thanks for your help guys.
... the binary created from compiling on his machine, which is the same, is about 6mB bigger
It's worth figuring out what the difference is (even if it's just the case that his build hides, while yours exposes, a real bug):
double-check you're compiling exactly the same code (no un-committed local changes, no extra headers in the include search path, etc.)
triple-check by adding a -E switch to your gcc arguments in cmake, so it will pre-process your files with the same include path as regular compilation; diff the pre-processor output
compare output from nm or objdump or whatever you have to for your two linked executables: if some system or 3rd-party library is a different version on one box, it may show up here
compare output from ldd if it's dynamically linked, make sure they're both getting the same library versions
compare the library versions it actually gets at runtime too, if possible. Hopefully you can do one of: run pldd, compare the .so entries in /proc/pid/map, run the process under strace/dtrace/truss and compare the runtime linker activity
As for the code ... if this doesn't work:
Eigen::VectorXd ModelSearcher::getMinimumBoundingBox();
// ...
Eigen::VectorXd msBb(6, 1); msBb = data.modelSearcher->getMinimumBoundingBox();
and this does:
void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
// ...
Eigen::VectorXd msBb(6, 1); data.modelSearcher->getMinimumBoundingBox(msBb);
you presumably have a problem with the assignment operator. If it does a shallow copy and there is dynamically-allocated memory in the vector, you'll end up with two vectors holding the same pointer, and they'll both free/delete it.
Note that if the operator isn't defined at all, the default is to do this shallow copy.
You said you have to change from:
void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd& input);
What was it before?
If it was:
void ModelSearcher::getMinimumBoundingBox(Eigen::MatrixXd input);
and the copy constructors / assignment operators weren't implemented properly it might have caused the problem.
Please do check how they are both implemented. Here's some info that might help.

Segmentation fault when adding a vector. (C++)

So I've got a pretty basic class that has a few methods and some class variables. Everythings working great up until I add a vector to the member variables in the header file:
std::vector <std::string> vectorofstuff;
If all I do is add this line then my program run perfectly but at the end, after all the output is there, I get a message about a seg fault.
My first guess is that I need to call the destructor on the vector, but that didn't seem to work. Plus my understanding is I don't need to call the destructor unless I use the word 'new'.
Any pushes in the right direction? Thanks bunches!
I guess the following either happened to you, or it was something similar involving unrealised dependencies/headers. Either way, I hope this answer might show up on Google and help some later, extremely confused programmer figure out why they're suddenly observing arbitrary crashes.
So, from experience, this can happen if you compile a new version of SomeObject.o but accidentally have another object file #include an old version of SomeObject.hpp. This leads to corruption, which'll be caused by the compiler referring to outdated member offsets, etc. Sometimes this mostly works and only produces segfaults when destructing objects - either related or seemingly distant ones - and other times the program segfaults right away or somewhere in-between; I've seen several permutations (regrettably!).
For anyone wondering why this can happen, maybe this is just a reflection of how little sleep I get while programming, but I've encountered this pattern in the context of Git submodules, e.g.:
MyRepo
/ GuiSubmodule
/ HelperSubmodule
/ / GuiSubmodule
If (A) you have a new commit in GuiSubmodule, which has not yet been pulled into HelperSubmodule's copy, (B) your makefile compiles MyRepo/uiSubmodule/SomeObject.o, and (C) another translation unit - either in a submodule or in the main repo via the perils of #include - links against an older version of SomeObject.hpp that has a different class layout... You're in for a fun time, and a lot of chasing red herrings until you finally realise the simple mistake.
Since I had cobbled together my build process from scratch, I might've just not been using Git/make properly - or strictly enough (forgetting to push/pull all submodules). Probably the latter! I see fewer odd bugs nowadays at least :)
You are probably corrupting the memory of the vectorofstuff member somewhere within your class. When the class destructor is called the destructor of the vector is called as well, which would try to point and/or delete to invalid memory.
I was fooling around with it and decided to, just to be sure, do an rm on everything and recompile. And guess what? That fixed it. I have no idea why, in the makefile I do this anyway, but whatever, I'm just glad I can move on and continue working on it. Thanks so much for all the help!

Visual Studio 2005 C compiler problem when optimizing a switch statement

General Question which may be of interest to others:
I ran into a, what I believe, C++-compiler optimization (Visual Studio 2005) problem with a switch statement. What I'd want to know is if there is any way to satisfy my curiosity and find out what the compiler is trying to but failing to do. Is there any log I can spend some time (probably too much time) deciphering?
My specific problem for those curious enough to continue reading - I'd like to hear your thoughts on why I get problems in this specific case.
I've got a tiny program with about 500 lines of code containing a switch statement. Some of its cases contain some assignment of pointers.
double *ptx, *pty, *ptz;
double **ppt = new double*[3];
//some code initializing etc ptx, pty and ptz
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
The middle statement seems to hang the compiler. The compilation never ends. OK, I didn't wait for longer than it took to walk down the hall, talk to some people, get a cup of coffee and return to my desk, but this is a tiny program which usually compiles in less than a second. Remove a single line (the one indicated in the code above) and the problem goes away, as it also does when removing the optimization (on the whole program or using #pragma on the function).
Why does this middle line cause a problem? The compilers optimizer doesn't like pty.
There is no difference in the vectors ptx, pty, and ptz in the program. Everything I do to pty I do to ptx and ptz. I tried swapping their positions in ppt, but pty was still the line causing a problem.
I'm asking about this because I'm curious about what is happening. The code is rewritten and is working fine.
Edit:
Almost two weeks later, I check out the closest version to the code I described above and I can't edit it back to make it crash. This is really annoying, embarrassing and irritating. I'll give it another try, but if I don't get it breaking anytime soon I guess this part of the question is obsolete and I'll remove it. Really sorry for taking your time.
If you need to make this code compilable without changing it too much consider using memcpy where you assign a value to ppt[1]. This should at least compile fine.
However, you problem seems more like another part of the source code causes this behaviour.
What you can also try is to put this stuff:
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
in another function.
This should also help compiler a bit to avoid the path it is taking to compile your code.
Did you try renaming pty to something else (i.e. pt_y)? I encountered a couple of times (i.e. with a variable "rect2") the problem that some names seem to be "reserved".
It sounds like a compiler bug. Have you tried re-ordering the lines? e.g.,
ppt[1]=pty;
ppt[0]=ptx;
ppt[2]=ptz;
Also what happens if you juggle about the values that are assigned (which will introduce bugs in your code, but may indicator whether its the pointer or the array that's the issue), e.g.:
ppt[0] = pty;
ppt[1] = ptz;
ppt[2] = ptx;
(or similar).
It's probably due to your declaration of ptx, pty and ptz with them being optimised out to use the same address. Then this action is causing your compiler problems later in your code.
Try
static double *ptx;
static double *pty;
static double *ptz;