FORTRAN IV/66 program stalls in DO loops - fortran

I copied a FORTRAN IV program from a thesis, so it presumably worked at the time it was written. I compiled it with gfortran. When running, it stalls in an integration subroutine. I have tried easing off the residuals but to no avail. I am asking for help because (presuming no mistakes in code) gfortran might not like the archaic 66/IV code, and updating it is outside my abilities.
The program gets stuck by line 9, so I wonder if the DO loops are responsible. Note, lines 1 and 6 are unusual to me because ',1' has been added to the ends: e.g. =1,N,1.
I don't think it's necessary to show the FUNC subroutine called on line 5 but am happy to provide it if necessary.
If you need more detailed information I am happy to provide it.
00000001 13 DO 22 TDP=QDP,7,1
00000002 TD=TDP-1
00000003 X=X0+H0
00000004 IF(TD.EQ.QD) GOTO 15
00000005 CALL FUNC(N,DY,X,Y,J)
00000006 15 DO 21 RD=1,N,1
00000007 GOTO (120,121,122,123,124,125,126),TDP
00000008 120 RK(5*N*RD)=Y(RD)
00000009 GOTO 21
00000010 121 RK(RD)=HD*DY(RD)
00000011 H0=0.5*HD
00000012 F0=0.5*RK(RD)
00000013 GOTO 20
00000014 122 RK(N+RD)=HD*DY(RD)
00000015 F0=0.25*(RK(RD)+RK(N+RD))
00000016 GOTO 20
00000017 123 RK(2*N+RD)=HD*DY(RD)
00000018 H0=HD
00000019 F0=-RK(N+RD)+2.*RK(2*N+RD)
00000020 GOTO 20
00000021 124 RK(3*N+RD)=HD*DY(RD)
00000022 H0=0.66666666667*HD
00000023 F0=(7.*RK(RD)+10.*RK(N+RD)+RK(3*N+RD))/27.
00000024 GOTO 20
00000025 125 RK(4*N+RD)=HD*DY(RD)
00000026 H0=0.2*HD
00000027 F0=(28.*RK(RD)-125.*RK(N+RD)+546.*RK(2*N+RD)+54.*RK(3*N+RD)-
00000028 1378.*RK(4*N+RD))/625.
00000029 GOTO 20
00000030 126 RK(6*N+RD)=HD*DY(RD)
00000031 F0=0.1666666667*(RK(RD)+4.*RK(2*N+RD)+RK(3*N+RD))
00000032 X=X0+HD
00000033 ER=(-42.*RK(RD)-224.*RK(2*N+RD)-21.*RK(3*N+RD)+162.*RK(4*N+RD)
00000034 1+125.*RK(6*N+RD))/67.2
00000035 YN=RK(5*N+RD)+F0
00000036 IF(ABS(YN).LT.1E-8) YN=1
00000037 ER=ABS(ER/YN)
00000038 IF(ER.GT.G0) GOTO 115
00000039 IF(ED.GT.ER) GOTO 20
00000040 QD=-1
00000041 20 Y(RD)=RK(5*N+RD)+F0
00000042 21 CONTINUE
00000043 22 CONTINUE

It's difficult to be certain (not entirely sure your snippet exactly matches your source file) but your problem might arise from an old FORTRAN gotcha -- a 0 in column 6 is (or rather was) treated as a blank. Any other (non-blank) character in column 6 is/was treated as a continuation indicator, but not the 0.

Not all f66 compilers adhered to the convention of executing a loop at least once, but it was a common (non-portable) assumption.
Similarly, the assumption of all static variables was not a portable one, but can be implemented by adding a SAVE statement, beginning with f77. A further assumption that SAVE variables will be zero-initialized is even more non-portable, but most compilers have an option to implement that.
If an attempt is being made to resurrect old code, it is probably worth while to get it working before modernizing it incrementally so as to make it more self-documenting. The computed goto looks like a relatively sane one which could be replaced by select case, at a possible expense of optimization. Here the recent uses of the term "modernization" become contradictory.

The ,1 bits are to get the compiler to spot the errors. It is quite common to do the following
DO 10 I = 1.7
That is perfectly legal since spaces are allowed in variable names. If you wish to avoid that, then put in the extra number. The following will generate errors
DO 10 I = 1.7,1
DO 10 I = 1,7.1
DO 10 I = 1.7.1
Re the program getting struck, try puting a continuation line between labels 21 and 22. The if-goto is the same as if-not-then in the later versions of Fortran and the computed goto is the same as a select statement. You don't need to recode it: there is nothing wrong with it other than youngsters getting confused whenever they see goto. All you need to do is indent it and it becomes obvious. So what you will have is
DO 22 TDP = QDP, 7, 1
...
DO 23 RD = 1, N, 1
GOTO (...) TDP
...
GOTO 21
...
GOTO 20
...
GOTO 20
...
20 CONTINUE
Y(RD) = ...
21 CONTINUE
23 CONTINUE
22 CONTINUE
You will probably end up with far more code if you try recoding it. It will look exactly the same except that gotos have been replaced by other words. It is possible that the compiler is generating the wrong code so just help it by putting a few dummy (CONTINUE) statements.

Related

c/c++: How can I know the size of used flash memory?

I recently faced flash overflow problem. After doing some optimization in code, I saved some flash memory and executed software successfully. I want to how much flash memory is saved through my changes. Please let me know how can I check for used flash / available flash memory. Also I want to how much flash is utilized by particular function/file.
Below mentioned are some info about my developing environment.
- Avr microcontroller with 64 k ram and 512 K flash.
- Using freeRtos.
- Using GNU C++ compiler.
- Using AVRATJTAGEICE for programming and Debugging.
Please let me know the solution.
Regards,
Jagadeep.
GCC's size program is what you're looking for.
size can be passed the full compiled .elf file. It will, by default, output something like this:
$ size linked-file.elf
text data bss dec hex filename
11228 112 1488 12828 321c linked-file.elf
This is saying:
There are 11228 bytes in the .text "section" of this file. This is generally for functions.
There are 112 bytes of initialized data: global variables in the program with initial values.
There are 1488 bytes of uninitialized data: global variables without initial values.
dec is simply the sum of the previous 3 values: 11228 + 112 + 1488 = 12828.
hex is simply the hexadecimal representation of the dec value: 0x321c == 12828.
For embedded systems, generally dec needs to be smaller than the flash size of your target device (or the available space on the device).
It is generally sufficient to simply watch the dec or text outputs of GCC's size command to monitor the size of your compiled code over time. A large jump in size often indicates a poorly implemented new feature or constexpr that are not getting compiled away. (Don't forget function-sections and data-sections).
Note: For AVR's, you'll want to use avr-size for checking the linked size of AVR .elf files. avr-size takes an extra argument of the target chip and will automatically calculate the percentage of used flash for your chosen chip.
GCC's size also works directly on intermediate object files.
This is particularly useful if you want to check the compiled size of functions.
You should see something like this excerpt:
$ size -A main.cpp.o
main.cpp.o :
section size addr
.group 8 0
.group 8 0
.text 0 0
.data 0 0
.bss 0 0
.text._Z8sendByteh 8 0
.text._ZN3XMC5IOpin7setModeENS0_4ModeE 64 0
.text._ZN7NamSpac6OptionIN5Clock4TimeEEmmEi 76 0
.text.Default_Handler 24 0
.text.HardFault_Handler 16 0
.text.SVC_Handler 16 0
.text.PendSV_Handler 16 0
.text.SysTick_Handler 28 0
.text._Z5errorPKc 8 0
.text._ZN7NamSpac5Motor2goEi 368 0
.text._ZN7NamSpac5Motor3getEv 12 0
.rodata.cst1 1 0
.text.startup.main 632 0
.text._ZN7NamSpac7Program3runEv 380 0
.text._ZN7NamSpac8Position4tickEv 24 0
.text.startup._GLOBAL__sub_I__ZN7NamSpac7displayE 292 0
.init_array 4 0
.bss._ZN5Debug9formatterE 4 0
.rodata._ZL10dispDigits 8 0
.bss.position 4 0
.bss.motorState 4 0
.bss.count 4 0
.rodata._ZL9diameters 20 0
.bss._ZN7NamSpac8diameterE 16 0
.bss._ZN5Debug3pinE 12 0
.bss._ZN7NamSpac7displayE 24 0
.rodata.str1.4 153 0
.rodata._ZL12dispSegments 32 0
.bss._ZL16diametersDisplay 10 0
.bss.loadAggregate 4 0
.bss.startCount 4 0
.bss._ZL15runtimesDisplay 10 0
.bss._ZN7NamSpac7runtimeE 16 0
.bss.startTime 4 0
.rodata._ZL8runtimes 20 0
.comment 111 0
.ARM.attributes 49 0
Total 2494
Please let me know the solution.
Sorry, there's no the solution! You've gotta getting through what's linked to your final ELF, and decide if it was linked by intend, or unwanted default.
Please let me know how can I check for used flash / available flash memory.
That primarily depends on your actual target hardware platform, so you have to manage to get your .text section fitting in there.
Also I want to how much flash is utilized by particular function/file.
The nm tool of the GCC binutils provides detailed information about any (global) symbol found in an ELF file and the space it occupies in it's associated section. You'll just need to grep the results for particular functions/classes/namespaces (best demangled!) to accumulate section type and symbol filtered outputs for analysis.
That's the approach, I've been using for a little tool called nmalyzr. Sorry to say, as it stands on the GIT repo, its not really working as intended (I've got working versions, that aren't pushed back).
In general, it's a good strategy to chase for code that has #include <iostream> statements (no matter if std::cout or alike are used or not, static instances are provided!), or unwanted newlib/libstdc++ bindings as for e.g. default exception handling.
Use size command from binutils on the generated elf file. As you seem to use an AVR chip, use avr-size.
To get the size of functions, use nm command from binutils (avr-nm on AVR chips).

gdb strange behaviour ( [next] jumps few lines back on a block code)

I have notice pretty bizare bahaviour of gdb when debugging a straight block of code.
I ran gdb normally with following commands.
gdb ./exe
break main
run
next
then [enter] a few times.
What I got a as result was
35 world.generations(generations);
(gdb)
36 world.popSize(100);
(gdb)
37 world.eliteSize(5);
(gdb)
41 world.setEvaluationFnc( eval );
(gdb)
37 world.eliteSize(5);
(gdb)
39 world.pXOver(0.9);
(gdb)
38 world.pMut(0.9);
(gdb)
41 world.setEvaluationFnc( eval );
(gdb)
There is absolutely no reason to run over those lines twice. I do not understand this behaviour. The code looks as follows:
(gdb) list 39
34 SimpleGA<MySpecimen> world;
35 world.generations(generations);
36 world.popSize(100);
37 world.eliteSize(5);
38 world.pMut(0.9);
39 world.pXOver(0.9);
40
41 world.setEvaluationFnc( eval );
42
43 world.setErrorSink(stderrSink);
I am not sure if i should disregard it or there something wicked going on in my code. The app uses OpenMP and is compiled to use it. However, the info thread says there is only one thread running. Also, everything seems to give proper results, but even executed twice there should be no problems as those are mostly some plain setters.
Did anyone seen something like this or have any hints where to investigate ? I failed on my own =).
Thanks for hints,
luk32.
Most likely the compiler rearranging the code. I suppose the "new" order still works correctly?
If possible, try to debug with optimizations turned off; that increases the likelyhood of the executable staying closer to the source code.

How do `DO` loops work in Fortran 66?

I'm reading an old book I found in a second-hand book shop (again). This one is called "Fortran techniques - with special reference to non-numerical applications", by A. Colin Day, published by Cambridge University Press in 1972. It is, after all, very important to keep up with the latest in software development ;-)
This book claims to cover Fortran-66 (X3.9-1966), aka Fortran-IV, with a minor departure from that standard for DATA statements which isn't relevant here.
The trouble is, the book seems to leave a lot to guesswork, and my guesses are pretty uncertain WRT the DO loop. This is in chapter 1, so not a very good sign.
Here is one example...
DO 15 I = 1, 87
J = I - 44
In the DO line, 1 and 87 seem to represent the inclusive range for the loop - I takes values 1 to 87 inclusive, so J takes values -43 to +43 inclusive. However, what does the 15 represent?
Another example is...
N = 1
DO 33 I = 1, 10
...
33 N = N + N
In this case, 33 looks like a label or line number - presumably the last line executed before the loop repeats (or exits). But 33 is an odd number to choose just as an arbitrary label.
EDIT That was a mistake - see the answer by duffymo - How do `DO` loops work in Fortran 66?
And the very next example after that is...
DO 33 I = 1, 10
N = 2 ** (I-1)
Again using the same 33, but without any line being explicitly labelled with it.
Am I being confused because these are short snippets taken out of context? What does the n in DO n ... represent?
Here is a complete program that should answer some of your questions. One can easily test this history question ... FORTRAN IV is still supported by numerous compilers, though portions of FORTRAN IV are either officially obsolescent or, in my opinion, should be obsolete. I compiled and checked this program with both g77 (which is close to obsolete since it is long unsupported) and gfortran.
Here is a sample program:
implicit none
integer i
real q
q = 1.0
do i=1, 10
q = q * 1.5
end do
write (6, *) "modern loop: q =", q
q = 1.0
do 100 i=1, 10
q = q * 1.5
100 continue
write (6, *) "loop with continue: q =", q
q = 1.0
do 200 i=1, 10
200 q = q * 1.5
write (6, *) "loop without continue: q =", q
stop
end
And how to compile it with gfortran:
gfortran -ffixed-form -ffixed-line-length-none -std=gnu test_loops.for -o test_loops.exe
Re your question: if you terminate the loop with a labeled line that is an executable code, is that line part of the loop? The output of the program clearly shows that the labeled line IS part of the loop. Here is the output of gfortran:
modern loop: q = 57.665039
loop with continue: q = 57.665039
loop without continue: q = 57.665039
The line number tells the code where to go when the loop is complete.
Yes, the numbers are odd, arbitrary, and meaningless. It's part of what made FORTRAN hard to read and understand.
The number 15 known as a "Label" it was decided by the programmer. Depending on organisational standards these numbers were controlled and followed specific rules. Although some programmers didn't keep to standards and their code was a mess; Comments and line indentations were also part of standards followed by most.

Gameboy emulator testing strategies?

I'm writing a gameboy emulator, and am struggling with making sure opcodes are emulated correctly. Certain operations set flag registers, and it can be hard to track whether the flag is set correctly, and where.
I want to write some sort of testing framework, but thought it'd be worth asking here for some help. Currently I see a few options:
Unit test each and every opcode with several test cases. Issues are there are 256 8 bit opcodes and 50+ (can't remember exact number) 16 bit opcodes. This would take a long time to do properly.
Write some sort of logging framework that logs a stacktrace at each operation and compares it to other established emulators. This would be pretty quick to do, and allows a fairly rapid overview of what exactly went wrong. The log file would look a bit like this:
...
PC = 212 Just executed opcode 7c - Register: AF: 5 30 BC: 0 13 HL: 5 ce DE: 1 cd SP: ffad
PC = 213 Just executed opcode 12 - Register: AF: 5 30 BC: 0 13 HL: 5 ce DE: 1 cd SP: ffad
...
Cons are I need to modify the source of another emulator to output the same form. And there's no guarantee the opcode is correct as it assumes the other emulator is.
What else should I consider?
Here is my code if it helps: https://github.com/dbousamra/scalagb
You could use already established test roms. I would recommend Blargg's test roms. You can get them from here: http://gbdev.gg8.se/files/roms/blargg-gb-tests/.
To me the best idea is the one you already mentioned:
take an existing emulator that is well known and you have the source code. let's call it master emulator
take some ROM that you can use to test
test these ROMs in the emulator that is known to work well.
modify the master emulator so it produces log while it is running for each opcode that it executes.
do the same in your own emulator
compare the output
I think this one has more advantage:
you will have the log file from a good emulator
the outcome of the test can be evaluated much faster
you can use more than one emulator
you can go deeper later like putting memory to the log and see the differences between the two implementations.

What exactly does C++ profiling (google cpu perf tools) measure?

I trying to get started with Google Perf Tools to profile some CPU intensive applications. It's a statistical calculation that dumps each step to a file using `ofstream'. I'm not a C++ expert so I'm having troubling finding the bottleneck. My first pass gives results:
Total: 857 samples
357 41.7% 41.7% 357 41.7% _write$UNIX2003
134 15.6% 57.3% 134 15.6% _exp$fenv_access_off
109 12.7% 70.0% 276 32.2% scythe::dnorm
103 12.0% 82.0% 103 12.0% _log$fenv_access_off
58 6.8% 88.8% 58 6.8% scythe::const_matrix_forward_iterator::operator*
37 4.3% 93.1% 37 4.3% scythe::matrix_forward_iterator::operator*
15 1.8% 94.9% 47 5.5% std::transform
13 1.5% 96.4% 486 56.7% SliceStep::DoStep
10 1.2% 97.5% 10 1.2% 0x0002726c
5 0.6% 98.1% 5 0.6% 0x000271c7
5 0.6% 98.7% 5 0.6% _write$NOCANCEL$UNIX2003
This is surprising, since all the real calculation occurs in SliceStep::DoStep. The "_write$UNIX2003" (where can I find out what this is?) appears to be coming from writing the output file. Now, what confuses me is that if I comment out all the outfile << "text" statements and run pprof, 95% is in SliceStep::DoStep and `_write$UNIX2003' goes away. However my application does not speed up, as measured by total time. The whole thing speeds up less than 1 percent.
What am I missing?
Added:
The pprof output without the outfile << statements is:
Total: 790 samples
205 25.9% 25.9% 205 25.9% _exp$fenv_access_off
170 21.5% 47.5% 170 21.5% _log$fenv_access_off
162 20.5% 68.0% 437 55.3% scythe::dnorm
83 10.5% 78.5% 83 10.5% scythe::const_matrix_forward_iterator::operator*
70 8.9% 87.3% 70 8.9% scythe::matrix_forward_iterator::operator*
28 3.5% 90.9% 78 9.9% std::transform
26 3.3% 94.2% 26 3.3% 0x00027262
12 1.5% 95.7% 12 1.5% _write$NOCANCEL$UNIX2003
11 1.4% 97.1% 764 96.7% SliceStep::DoStep
9 1.1% 98.2% 9 1.1% 0x00027253
6 0.8% 99.0% 6 0.8% 0x000274a6
This looks like what I'd expect, except I see no visible increase in performance (.1 second on a 10 second calculation). The code is essentially:
ofstream outfile("out.txt");
for loop:
SliceStep::DoStep()
outfile << 'result'
outfile.close()
Update: I timing using boost::timer, starting where the profiler starts and ending where it ends. I do not use threads or anything fancy.
From my comments:
The numbers you get from your profiler say, that the program should be around 40% faster without the print statements.
The runtime, however, stays nearly the same.
Obviously one of the measurements must be wrong. That means you have to do more and better measurements.
First I suggest starting with another easy tool: the time command. This should get you a rough idea where your time is spend.
If the results are still not conclusive you need a better testcase:
Use a larger problem
Do a warmup before measuring. Do some loops and start any measurement afterwards (in the same process).
Tiristan: It's all in user. What I'm doing is pretty simple, I think... Does the fact that the file is open the whole time mean anything?
That means the profiler is wrong.
Printing 100000 lines to the console using python results in something like:
for i in xrange(100000):
print i
To console:
time python print.py
[...]
real 0m2.370s
user 0m0.156s
sys 0m0.232s
Versus:
time python test.py > /dev/null
real 0m0.133s
user 0m0.116s
sys 0m0.008s
My point is:
Your internal measurements and time show you do not gain anything from disabling output. Google Perf Tools says you should. Who's wrong?
_write$UNIX2003 is probably referring to the write POSIX system call, which outputs to the terminal. I/O is very slow compared to almost anything else, so it makes sense that your program is spending a lot of time there if you are writing a fair bit of output.
I'm not sure why your program wouldn't speed up when you remove the output, but I can't really make a guess on only the information you've given. It would be nice to see some of the code, or even the perftools output when the cout statement is removed.
Google perftools collects samples of the call stack, so what you need is to get some visibility into those.
According to the doc, you can display the call graph at statement or address granularity. That should tell you what you need to know.