ifort dialect options for very old code - fortran

I have been suddenly given a few very old fortran codes to compile and get working for my research group. Using ifort when I compile the code I get the following error:error #6526: A branch to a do-term-shared-stmt has occurred from outside the range of the corresponding inner-shared-do-construct.
Here is the bit of code that seems to be at fault:
...
IF(THRU.GT.0.D0) GO TO 120 00011900
L1=LL 00012000
A1=AA 00012100
DR1=DRR 00012200
RMAX1=RMAXX 00012300
RMIN1=RMINN 00012400
IF(DR1.EQ.0.D0) DR1=DRP 00012500
KMAX1=(RMAX1-RMIN1)/DR1+1.D-08 00012600
IF(KMAX1.GT.NN .OR. KMAX1.LE.0) KMAX1=NN 00012700
RINT1=RMAX1 00012800
IF(RMAX1.NE.0.D0) 00012900
2RMAX1=DR1*DFLOAT(KMAX1)+RMIN1 00013000
IUP=KMAX1 00013100
R=RMIN1 00013200
DO 120 K1=1,KMAX1 00013300
R=R+DR1 00013400
XX(K1)=R 00013500
120 CONTINUE 00013600
WRITE(IW,940) NAM1,IOPT1,L1,A1,DR1,RMAX1,RMIN1,ANU1,BNU1,CNU1 00013700
121 CONTINUE 00013800
...
Being quite unfamiliar with fortran I have been doing some searching and what I can see is that it does not like that IF statement branching to the terminal statement of the DO loop. Also, it seems that some older dialects or compilers supported this.
My questions is this: Is there an option that will allow ifort to successfully compile this? I.E. Is there a specific dialect compatibility option on ifort that will make this legal?
What are the side effects of that particular code on a compiler that would accept it? Is it possible that aside from the write statement the side effects are identical to going to line 121? Or was maybe the do loop supposed to go to 121?
I would consider modifying the code if not for the fact that my advisor told me that I should not make any changes whatsoever without consulting him first and so I ask this question to see if I need to consult him or not. That said if my only option is a modification to the code suggestions would be welcome for that so that I can have an idea of what needs to be done when I go to my advisor.

I'll note that the code in question isn't valid even as Fortran 77, but people did weird stuff back in the mists of time. I'll defer to any historian or person with experience. In particular, one should be very careful to understand any subtleties in the code in relation to my "answer" here.
If we assume that the intention of the goto given is to jump to code after the execution of the loop, then I answer about minimal change to the code.
Stripping line-number comments for clarity then
DO 120 K1=1,KMAX1
R=R+DR1
XX(K1)=R
120 CONTINUE
WRITE(IW,940) NAM1,IOPT1,L1,A1,DR1,RMAX1,RMIN1,ANU1,BNU1,CNU1
can be replaced by
DO K1=1,KMAX1
R=R+DR1
XX(K1)=R
END DO
120 WRITE(IW,940) NAM1,IOPT1,L1,A1,DR1,RMAX1,RMIN1,ANU1,BNU1,CNU1
I would be quite surpised if the author of the code intended the loop to continue to 121: no variable referenced by the write statement is updated in the loop. It is possible that the goto was intended to reference 121 but I do see that many variables are not updated in the section that would be skipped.

Related

Preserving line numbers when pre-processing Fortran code

I have inherited a 10K-lines Fortran code that uses fpx3 for pre-processing. At this point I should say that I don't have much Fortran experience, and this is my first time dealing with a code that has a pre-processor.
The code works fine, except that when the pre-processing runs, it creates secondary Fortran files (e.g., main.f90 creates t.main.f90) that do not preserve the total amount of lines. The reason, of course, is that some code is lost when pre-processing (IF-ELSE clauses, pre-proc. directives, etc.).
This would be fine, except that when I get a bug in the code, the line number I get from the bug refers to the pre-compiled code (e.g., t.main.90) instead of the original code.
This isn't a major problem, but for nearly every bug I have to check the line (say, line 80 of t.main.f90) and manually try to find this line in the original (let's say it ends up being line 92 of main.f90). I've tried to find a way around this by telling fpx3 to comment the unused lines instead of throwing them out, but I couldn't find much info on fpx3 online.
What's the best way around this?
P.S.: I don't know if it matters, but I'm using ifort to compile.
Try -fixed as a .f90 file is assumed to be -free (free form), and fixed preserves the 1st few columns and the continuation in column 6.
If the number at on the right hand side, then NOT -132, but -72 or -80. To not include the numbers as "compilable" code. I use -132, so you have to look up the right switch.

Parentheses around multiple values in implied DO loop - "error: expected a right parenthesis in expression"

When migrating from GNU FORTRAN Compiler 4.3.2 to 4.8.5 a user got an error
p.for:227.25:
write(3,446) ((r(i),ERC(i),EIC(i),ERp(i),EIp(i)),i=1,I1)
1
Error: Expected a right parenthesis in expression at (1)
Here the output list is built using the implied DO-loop construct. The error persists even when compiling with --std=legacy.
The problem is easy to fix. When I remove the parentheses around the list of array elements the code compiles and seems to work as expected.
write(3,446) (r(i),ERC(i),EIC(i),ERp(i),EIp(i),i=1,I1)
I would be content with this fix, however our users often store multiple copies of code on different systems and exchange code between them seemingly at random. On the other hand, the makefiles and build scripts are usually specific to our cluster.
It would be nice to find that there is a command line option that will let such code pass syntax checking. I could not find anything like -ffixed-parenthesized-values-in-implied-loop in the list of available GNU Fortran Compiler Language Options.
Q: Is it possible to tweak a build script to let such statement compile without modifying the source code?

Boost::spirit illegal_backtracking exception

I use Boost.Spirit.Lex and .Qi for a simple calculator project and (as usual) it gives me some pain to debug and use. The debug prints:
<expression>
<try>boost::spirit::multi_pass::illegal_backtracking
This exception is thrown and I can't understand why. I use macros in my code and it would be a pain to give a minimal example so I give the whole project. Just do "make" at the root, and then launch ./sash, a prompt will appear, if you want to test just do "-echo $5-8".
It seems that Google didn't find any similar problems about this exception...
The parser is in arithmetic/, and the call of the parser is at the end of arithmetic/evaluator.cpp
Any helps greatly appreciate.
Your code is breaking because BOOST_SPIRIT_QI_DEBUG as well as the on_error<> handler seem to use iterators after they might have been invalidated.
To be honest, I'm not completely sure how this could happen.
Background
AFAICT lexertl uses spirit::multipass<> with a split_functor input policy and a split_std_deque storage policy [1].
Now, (luckily?) the checking policy is buf_id_check which means that the iterator will check for invalidation at the time of dereference.
Iterators are expected to be invalidated if
the iterator is derefenced, growing the buffer to >16 tokens and the iterator is the only one referring to the shared state.
or somewhere along the line clear_queue is called explicitely (e.g. from the flush_multi_path primitive in the Spirit Repository)
Honestly I don't see any of these two conditions being met. A quick and dirty
token_iterator_type clone = iter; // just to make it non-unique...
in evaluator.cpp doesn't make a difference (ruling out reason #1)
Temporary disabling the docheck implementation in the buf_id_check_policy made valgrind point out that on_error<> and BOOST_SPIRIT_DEBUG* are causing invalid memory references. Commenting both indeed makes all problems go away (and the eval_expression now works).
However, this is likely not your preferred solution.
Proposed solution
Since
you're working on a fixed, in-memory container representing the input you don't really need multi_pass behaviour emulation
you're using a trivial grammar, you don't really benefit from lexertl - while you are getting a lot of added complexity (as you can see)
I've quickly refactored some code: https://github.com/sehe/sash-refactor/commits/master
commit dec31496 sanity - lets do without macros
4 files changed, 59 insertions(+), 146 deletions(-)
commit 6056574c dead code, excess scope, excess instantiation
5 files changed, 38 insertions(+), 62 deletions(-)
commit 99d441db remove lexer
9 files changed, 25 insertions(+), 177 deletions(-)
Now, you will find that your code is generally much simpler, also much shorter, not running into multi_pass limits and you can still have SPIRIT_DEBUG as well as on_error handling :) In the end
binary size in -g3 is reduced from 16Mb to 6.5Mb
a net 263 lines of code have been removed
more importantly, it works
Here's some samples (without debug output):
$ ./sash <<< '-echo $8-9'
-1
Warning: Empty environment variable "8-9".
$ ./sash <<< '-echo $8\*9'
72
Warning: Empty environment variable "8*9".
$ ./sash <<< '-echo $8\*(9-1)'
64
Warning: Empty environment variable "8*(9-1)".
$ ./sash <<< '-echo $--+-+8\*(9-1)'
-64
Warning: Empty environment variable "--+-+8*(9-1)".
[1] Which, despite it's name, buffers previously seen tokens in a std::vector<>

How do I associate changed lines with functions in a git repository of C code?

I'm attempting to construct a “heatmap” from a multi-year history stored in a git repository where the unit of granularity is individual functions. Functions should grow hotter as they change more times, more frequently, and with more non-blank lines changed.
As a start, I examined the output of
git log --patch -M --find-renames --find-copies-harder --function-context -- *.c
I looked at using Language.C from Hackage, but it seems to want a complete translation unit—expanded headers and all—rather being able to cope with a source fragment.
The --function-context option is new since version 1.7.8. The foundation of the implementation in v1.7.9.4 is a regex:
PATTERNS("cpp",
/* Jump targets or access declarations */
"!^[ \t]*[A-Za-z_][A-Za-z_0-9]*:.*$\n"
/* C/++ functions/methods at top level */
"^([A-Za-z_][A-Za-z_0-9]*([ \t*]+[A-Za-z_][A-Za-z_0-9]*([ \t]*::[ \t]*[^[:space:]]+)?){1,}[ \t]*\\([^;]*)$\n"
/* compound type at top level */
"^((struct|class|enum)[^;]*)$",
/* -- */
"[a-zA-Z_][a-zA-Z0-9_]*"
"|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
"|[-+*/<>%&^|=!]=|--|\\+\\+|<<=?|>>=?|&&|\\|\\||::|->"),
This seems to recognize boundaries reasonably well but doesn’t always leave the function as the first line of the diff hunk, e.g., with #include directives at the top or with a hunk that contains multiple function definitions. An option to tell diff to emit separate hunks for each function changed would be really useful.
This isn’t safety-critical, so I can tolerate some misses. Does that mean I likely have Zawinski’s “two problems”?
I realise this suggestion is a bit tangential, but it may help in order to clarify and rank requirements. This would work for C or C++ ...
Instead of trying to find text blocks which are functions and comparing them, use the compiler to make binary blocks. Specifically, for every C/C++ source file in a change set, compile it to an object. Then use the object code as a basis for comparisons.
This might not be feasible for you, but IIRC there is an option on gcc to compile so that each function is compiled to an 'independent chunk' within the generated object code file. The linker can pull each 'chunk' into a program. (It is getting pretty late here, so I will look this up in the morning, if you are interested in the idea. )
So, assuming we can do this, you'll have lots of functions defined by chunks of binary code, so a simple 'heat' comparison is 'how much longer or shorter is the code between versions for any function?'
I am also thinking it might be practical to use objdump to reconstitute the assembler for the functions. I might use some regular expressions at this stage to trim off the register names, so that changes to register allocation don't cause too many false positive (changes).
I might even try to sort the assembler instructions in the function bodies, and diff them to get a pattern of "removed" vs "added" between two function implementations. This would give a measure of change which is pretty much independent of layout, and even somewhat independent of the order of some of the source.
So it might be interesting to see if two alternative implementations of the same function (i.e. from different a change set) are the same instructions :-)
This approach should also work for C++ because all names have been appropriately mangled, which should guarantee the same functions are being compared.
So, the regular expressions might be kept very simple :-)
Assuming all of this is straightforward, what might this approach fail to give you?
Side Note: This basic strategy could work for any language which targets machine code, as well as VM instruction sets like the Java VM Bytecode, .NET CLR code, etc too.
It might be worth considering building a simple parser, using one of the common tools, rather than just using regular expressions. Clearly it is better to choose something you are familiar with, or which your organisation already uses.
For this problem, a parser doesn't actually need to validate the code (I assume it is valid when it is checked in), and it doesn't need to understand the code, so it might be quite dumb.
It might throw away comments (retaining new lines), ignore the contents of text strings, and treat program text in a very simple way. It mainly needs to keep track of balanced '{' '}', balanced '(' ')' and all the other valid program text is just individual tokens which can be passed 'straight through'.
It's output might be a separate file/function to make tracking easier.
If the language is C or C++, and the developers are reasonably disciplined, they might never use 'non-syntactic macros'. If that is the case, then the files don't need to be preprocessed.
Then a parser is mostly just looking for a the function name (an identifier) at file scope followed by ( parameter-list ) { ... code ... }
I'd SWAG it would be a few days work using yacc & lex / flex & bison, and it might be so simple that their is no need for the parser generator.
If the code is Java, then ANTLR is a possible, and I think there was a simple Java parser example.
If Haskell is your focus, their may be student projects published which have made a reasonable stab at a parser.

Semicolon in C++?

Is the "missing semicolon" error really required? Why not treat it as a warning?
When I compile this code
int f = 1
int h=2;
the compiler intelligently tells me that where I am missing it. But to me it's like - "If you know it, just treat it as if it's there and go ahead. (Later I can fix the warning.)
int sdf = 1, df=2;
sdf=1 df =2
Even for this code, it behaves the same. That is, even if multiple statements (without ;) are in the same line, the compiler knows.
So, why not just remove this requirement? Why not behave like Python, Visual Basic, etc.
Summary of discussion
Two examples/instances were missing, and a semi-colon would actually cause a problem.
1.
return
(a+b)
This was presented as one of the worst aspects of JavaScript. But, in this scenario, semicolon insertion is a problem for JavaScript, but not
for C++. In C++, you will get another error if ; insertion is done after return. That is, a missing return value.
2
int *y;
int f = 1
*y = 2;
For this I guess, there is no better way than to introduce as statement separator, that is, a semicolon.
It's very good that the C++ compiler doesn't do this. One of the worst aspects of JavaScript is the semicolon insertion. Picture this:
return
(a + b);
The C++ compiler will happily continue on the next line as expected, while a language that "inserts" semicolons, like JavaScript, will treat it as "return;" and miss out the "(a + b);".
Instead of rely on compiler error-fixing, make it a habit to use semicolons.
There are many cases where a semicolon is needed.
What if you had:
int *y;
int f = 1
*y = 2;
This would be parsed as
int *y;
int f = 1 * y = 2;
So without the semicolons it is ambiguous.
First, this is only a small example; are you sure the compiler can intelligently tell you what's wrong for more complex code? For any piece of code? Could all compilers intelligently recognize this in the same way, so that a piece of C++ code could be guaranteed portable with missing semicolons?
Second, C++ was created more than a decade ago when computing resources aren't nearly what they are now. Even today, builds can take a considerable amount of time. Semicolons help to clearly demarcate different commands (for the user and for the compiler!) and assist both the programmer and the compiler in understanding what's happening.
; is for the programmer's convenience. If the line of code is very long then we can press enter and go to second line because we have ; for line separator. It is programming conventions. There must be a line separator.
Having semi-colons (or line breaks, pick one) makes the compiler vastly simpler and error messages more readable.
But contrary to what other people have said, neither form of delimiters (as an absolute) is strictly necessary.
Consider, for example, Haskell, which doesn’t have either. Even the current version of VB allows line breaks in many places inside a statement, as does Python. Neither requires line continuations in many places.
For example, VB now allows the following code:
Dim result = From element in collection
Where element < threshold
Select element
No statement delimiters, no line continuations, and yet no ambiguities whatsoever.
Theoretically, this could be driven much further. All ambiguities can be eliminated (again, look at Haskell) by introducing some rules. But again, this makes the parser much more complicated (it has to be context sensitive in a lot of places, e.g. your return example, which cannot be resolved without first knowing the return type of the function). And again, it makes it much harder to output meaningful diagnostics since an erroneous line break could mean any of several things so the compiler cannot know which error the user has made, and not even where the error was made.
In C programs semicolons are statement terminators, not separators. You might want to read this fun article.
+1 to you both.
The semi-colon is a command line delimiter, unlike VB, python etc. C and C++ ignore white space within lines of code including carriage returns! This was originally because at inception of C computer monitors could only cope with 80 characters of text and as C++ is based on the C specification it followed suit.
I could post up the question "Why must I keep getting errors about missing \ characters in VB when I try and write code over several lines, surely if VB knows of the problem it can insert it?"
Auto insertion as has already been pointed out could be a nightmare, especially on code that wraps onto a second line.
I won't extend much of the need for semi-colon vs line continuation characters, both have advantages and disadvantages and in the end it's a simple language design choice (even though it affects all the users).
I am more worried about the suggestion for the compiler to fix the code.
If you have ever seen a marvelous tool (such as... hum let's pick up a merge tool) and the way it does its automated work, you would be very glad indeed that the compiler did not modify the code. Ultimately if the compiler knew how to fix the code, then it would mean it knew your intent, and thought transmission has not been implemented yet.
As for the warning ? Any programmer worth its salt knows that warnings should be treated as errors (and compilation stopped) so what would be the advantage ?
int sdf = 1,df=2;
sdf=1 df =2
I think the general problem is that without the semicolon there's no telling what the programmer could have actually have meant (e.g may-be the second line was intended as sdf = 1 + df - 2; with serious typos). Something like this might well result from completely arbitrary typos and have any intended meaning, wherefore it might not be such a good idea after all to have the compiler silently "correct" errors.
You may also have noticed that you often get "expected semicolon" where the real problem is not a lack of a semicolon but something completely different instead. Imagine a malformed expression that the compiler could make sense out of by silently going and inserting semicolons.
The semicolon may seem redundant but it is a simple way for the programmer to confirm "yes, that was my intention".
Also, warnings instead of compiler errors are too weak. People compile code with warnings off, ignore warnings they get, and AFAIK the standard never prescribes what the compiler must warn about.