ARMv7-M assembly ITEE usage - if-statement

Problem:
I'm trying to get the ITEE EQ ie 'If-Then-Else-Else Equal' block to work when R6==0, with THEN branching into the END label but the assembler called out an error on the line: BEQ END
Program info:
I'm doing a program that is in pertinence with Optimization. I'm using a Gradient Descent to converge to the point where gradient is 0 to find the solution x* that minimises some function f(x). I'm using C language to call an assembly function which is this program here.
Here is my program where the error is:
CMP R6, #0 # Compare f'(x) with 0
ITEE EQ # If R6 == 0, Then-Else-Else
BEQ END # Calls END label if equal
SUBNE R0, R6 # Change R0(x) in the opp direction of gradient to get lower value of f(x) if not equal
BNE optimize # Branch to optimize if not equal
This is my first assembly program using the NXP LPC1769 for a school assignment. Do let me know what I'm missing or I have done wrong. Thank you!
Here is my whole program:
.syntax unified
.cpu cortex-m3
.thumb
.align 2
.global optimize
.thumb_func
optimize:
# Write optimization function in assembly language here
MOV R5, INLAMBDA # R5 holds value of inverse lambda(10) ie to eliminate floating point
LDR R6, #2 # Load R6 with value '2' ie constant of f'(x)
MUL R6, R1, R6 # Multiply R6(2) with R1(a) & store to R6(results)
MLA R6, R6, R0, R2 # Multiply R6(results) with R0(x) & sum with R2(b) to get f'(x). Store & update results to R6
SDIV R6, R5 # Divide R6(results) by R5(1/lambda) to get f'(x) * lambda
CMP R6, #0 # Compare f'(x) with 0
ITEE EQ # If R6 == 0, Then-Else-Else
BEQ END # Calls END label if equal
SUBNE R0, R6 # Change R0(x) in the opp direction of gradient to get lower value of f(x) if not equal
BNE optimize # Branch to optimize if not equal
# End label
END:
BX LR
# Define constant values
CONST: .word 123
INLAMBDA: .word 10 # Inverse lambda 1 / lambda(0.1) is 10

The problem is that BEQ END is in the middle of the IT block. To quote some documentation for IT:
A branch or any instruction that modifies the PC is only permitted in an IT block if it is the last instruction in the block.
That said, because it's a branch, the "else" is implicit anyway - if you take the branch, you won't be executing the following instructions on account of being elsewhere, and if you don't take it you've got no choice but to execute them, so there's no need for them to be explicitly conditional at all. In fact, you don't even need the IT either, since B<cond> has a proper Thumb instruction encoding in its own right. But then you also don't even need that, because you're doing a short forward branch based on a register being zero, and there's a specific Thumb-only compare-and-branch instruction for doing exactly that!
In other words, your initial 5-line snippet can be expressed simply as:
CBZ R6, END
SUB R0, R6
B optimize

Related

Printing list in MIPS recursion

Currently, I am working with MIPS, and I was wondering what modifications should I make to following code(recursive function to print list):
printList:
addi $sp, $sp, -8
sw $ra, 0($sp)
beqz $a0, endRec
lw $t0, 0($a0)
sw $t0, 4($sp)
lw $a0, 4($a0)
jal printList
lw $a0, 4($sp)
li $v0, 1
syscall
la $a0, str
li $v0, 4
syscall
endRec:
lw $ra, 0($sp)
addi $sp, $sp, 8
jr $ra
such that it prints list in "normal" order(for example if I am adding elements on the end, 1 2 3, I want it to print it like that and not like 3 2 1).
NOTE: str is blanco defined in data segment.
I also know that I could do reverse list and then call that function, but is there easier way?
Though you're working in MIPS, this is not really a MIPS problem, it is a general problem of recursion.
In recursion, let's say we have:
recursiveFunction (...) {
if condition then exit
// do some work #1
recursiveFunction (...);
// do some other work #2
}
The work that is done in the section tagged #1 will happen before the recursion, e.g. on the recursive descent — in some sense this happens forwards.
The work that is done in the section tagged #2 will happen after the recursion, e.g. on the unwinding of the recursion — in some sense this happens backwards.
If you put the printing in section #2, the list will come out backwards.  If you put the printing in section #1, the list will come out forwards.

How to find the function memory map in ARM ELF file?

I am trying to perform loop bound analysis for ARMV7m code using Z3 for a big Framework.
I would like to find the memory address that are used by a certain function inside .elf file
for example in a function foo() I have the below basic block
ldr r1, [r3, #0x20]
strb r2, [r3, #6] {__elf_header}
str r2, [r3, #0x24] {__elf_header}
str r2, [r3, #0x20] {__elf_header}
mov r3, r1
cmp r1, #0
bne #0x89f6
How can I get the initial memory location used by this function [r3, #0x20] ? Are there memory segements for every function to access or is it random ?
Given that the above basic block is a loop. Is there a way to know the memory address that will be used during its execution ?
Does the compiler for example save a memory location address from 0x20 to 0x1234 to be only accessed during the execution of such basic block ? In another word, Is there a map between a function and the range of memory address used by it ?
It is confusing as to what you are asking. First off why would any linker put the effort into randomizing things? Perhaps there is one to intentionally make the output not repeatable. But a linker is just a program and normally will do things like process the items on the command line in order, and then process each object from beginning to end...not random.
So far the rest of this seems pretty straight forward just use the tools. Your comment implies gnu tools? Since this is in part tool specific you should have tagged it as such as you cannot really make generalizations across all toolchains ever created.
unsigned int one ( void )
{
return(1);
}
unsigned int two ( void )
{
return(2);
}
unsigned int three ( void )
{
return(3);
}
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <one>:
0: e3a00001 mov r0, #1
4: e12fff1e bx lr
00000008 <two>:
8: e3a00002 mov r0, #2
c: e12fff1e bx lr
00000010 <three>:
10: e3a00003 mov r0, #3
14: e12fff1e bx lr
as shown they are all in .text, simple enough.
arm-none-eabi-gcc -O2 -c -ffunction-sections so.c -o so.o
arm-none-eabi-objdump -d so.o
so.o: file format elf32-littlearm
Disassembly of section .text.one:
00000000 <one>:
0: e3a00001 mov r0, #1
4: e12fff1e bx lr
Disassembly of section .text.two:
00000000 <two>:
0: e3a00002 mov r0, #2
4: e12fff1e bx lr
Disassembly of section .text.three:
00000000 <three>:
0: e3a00003 mov r0, #3
4: e12fff1e bx lr
and now each function has its own section name.
So the rest relies heavily on linking and there is no one linker script, you the programmer choose directly or indirectly and how the final binary (elf) is built is a direct result of that choice.
If you have something like this
.text : { *(.text*) } > rom
and nothing else with respect to these functions then all of them will land in this definition, but the linker script or instructions to the linker can indicate something else causing one or more to land in its own space.
arm-none-eabi-ld -Ttext=0x1000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <one>:
1000: e3a00001 mov r0, #1
1004: e12fff1e bx lr
00001008 <two>:
1008: e3a00002 mov r0, #2
100c: e12fff1e bx lr
00001010 <three>:
1010: e3a00003 mov r0, #3
1014: e12fff1e bx lr
and then of course
arm-none-eabi-nm -a so.elf
00000000 n .ARM.attributes
00011018 T __bss_end__
00011018 T _bss_end__
00011018 T __bss_start
00011018 T __bss_start__
00000000 n .comment
00011018 T __data_start
00011018 T _edata
00011018 T _end
00011018 T __end__
00011018 ? .noinit
00001000 T one <----
00000000 a so.c
00080000 T _stack
U _start
00001000 t .text
00001010 T three <----
00001008 T two <----
which is simply because there is a symbol table in the file
Symbol table '.symtab' contains 22 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00001000 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 3
4: 00011018 0 SECTION LOCAL DEFAULT 4
5: 00000000 0 FILE LOCAL DEFAULT ABS so.c
6: 00001000 0 NOTYPE LOCAL DEFAULT 1 $a
7: 00001008 0 NOTYPE LOCAL DEFAULT 1 $a
8: 00001010 0 NOTYPE LOCAL DEFAULT 1 $a
9: 00001008 8 FUNC GLOBAL DEFAULT 1 two
10: 00011018 0 NOTYPE GLOBAL DEFAULT 1 _bss_end__
11: 00011018 0 NOTYPE GLOBAL DEFAULT 1 __bss_start__
12: 00011018 0 NOTYPE GLOBAL DEFAULT 1 __bss_end__
13: 00000000 0 NOTYPE GLOBAL DEFAULT UND _start
14: 00011018 0 NOTYPE GLOBAL DEFAULT 1 __bss_start
15: 00011018 0 NOTYPE GLOBAL DEFAULT 1 __end__
16: 00001000 8 FUNC GLOBAL DEFAULT 1 one
17: 00011018 0 NOTYPE GLOBAL DEFAULT 1 _edata
18: 00011018 0 NOTYPE GLOBAL DEFAULT 1 _end
19: 00080000 0 NOTYPE GLOBAL DEFAULT 1 _stack
20: 00001010 8 FUNC GLOBAL DEFAULT 1 three
21: 00011018 0 NOTYPE GLOBAL DEFAULT 1 __data_start
but if
arm-none-eabi-strip so.elf
arm-none-eabi-nm -a so.elf
arm-none-eabi-nm: so.elf: no symbols
arm-none-eabi-objdump -d so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <.text>:
1000: e3a00001 mov r0, #1
1004: e12fff1e bx lr
1008: e3a00002 mov r0, #2
100c: e12fff1e bx lr
1010: e3a00003 mov r0, #3
1014: e12fff1e bx lr
The elf file format is somewhat trivial you can easily write code to parse it, you do not need a library or anything like that. And with simple experiments like these can easily understand how these tools work.
How can I get the initial memory used by this function ?
Assuming you mean the initial address assuming not relocated. You just read it out of the file. Simple.
Are there memory segments for every function to access or is it random ?
As demonstrated above, the command line option you mention later in a comment (should have been in the question, you should edit the question for completeness) does exactly that makes a custom section name per function. (what happens if you have the same non-global function name in two or more objects? you can easily figure this out on your own)
Nothing is random here, you would need to have a reason to randomize things for security or other, it is more often preferred that a tool outputs the same or at least similar results each time with the same inputs (some tools will embed a build date/time in the file and that may vary from one build to the next).
If you are not using gnu tools then binutils may still be very useful with parsing and displaying elf files anyway.
arm-none-eabi-nm so.elf
00011018 T __bss_end__
00011018 T _bss_end__
00011018 T __bss_start
00011018 T __bss_start__
00011018 T __data_start
00011018 T _edata
00011018 T _end
00011018 T __end__
00001000 T one
00080000 T _stack
U _start
00001010 T three
00001008 T two
nm so.elf (x86 binutils not arm)
00001000 t $a
00001008 t $a
00001010 t $a
00011018 T __bss_end__
00011018 T _bss_end__
00011018 T __bss_start
00011018 T __bss_start__
00011018 T __data_start
00011018 T _edata
00011018 T _end
00011018 T __end__
00001000 T one
00080000 T _stack
U _start
00001010 T three
00001008 T two
Or can build with clang and examine with gnu, etc. Obviously disassembly won't work, but some tools will.
If this is not what you were asking then you need to re-write your question or edit it so we can understand what you are actually asking.
Edit
I would like to know if there is a map between a function and the range of memory address used by it ?
In general no. The term function implies but is not limited to high level languages like C, etc. Where the machine code clearly has no clue nor should it and well optimized code does not necessarily have a single exit point from the function, much less a return marking the end. For architectures like the various arm instruction sets the return instruction is not the end of the "function", there is pool data that may follow.
But let's look at what gcc does.
unsigned int one ( unsigned int x )
{
return(x+1);
}
unsigned int two ( void )
{
return(one(2));
}
unsigned int three ( void )
{
return(3);
}
arm-none-eabi-gcc -O2 -S so.c
cat so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global one
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type one, %function
one:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
add r0, r0, #1
bx lr
.size one, .-one
.align 2
.global two
.syntax unified
.arm
.fpu softvfp
.type two, %function
two:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #3
bx lr
.size two, .-two
.align 2
.global three
.syntax unified
.arm
.fpu softvfp
.type three, %function
three:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #3
bx lr
.size three, .-three
.ident "GCC: (GNU) 10.2.0"
we see this is being placed in the file, but what does it do?
.size three, .-three
One reference says this is used so that the linker can remove the function if it is not used. And I have seen that feature in play so good to know (you could have looked this up just as easily as I did)
So in that context the info is there and you can extract it (lesson for the reader).
And then if you use this gcc compiler option that you mentioned
-ffunction-sections
Disassembly of section .text.one:
00000000 <one>:
0: e2800001 add r0, r0, #1
4: e12fff1e bx lr
Disassembly of section .text.two:
00000000 <two>:
0: e3a00003 mov r0, #3
4: e12fff1e bx lr
Disassembly of section .text.three:
00000000 <three>:
0: e3a00003 mov r0, #3
4: e12fff1e bx lr
[ 4] .text.one
PROGBITS 00000000 000034 000008 00 0 0 4
[00000006]: ALLOC, EXEC
[ 5] .rel.text.one
REL 00000000 0001a4 000008 08 12 4 4
[00000040]: INFO LINK
[ 6] .text.two
PROGBITS 00000000 00003c 000008 00 0 0 4
[00000006]: ALLOC, EXEC
[ 7] .rel.text.two
REL 00000000 0001ac 000008 08 12 6 4
[00000040]: INFO LINK
[ 8] .text.three
PROGBITS 00000000 000044 000008 00 0 0 4
[00000006]: ALLOC, EXEC
[ 9] .rel.text.three
REL 00000000 0001b4 000008 08 12 8 4
[00000040]: INFO LINK
That is giving us a size of the sections.
In general with respect to software compiled or in particular assembled, assume that a function doesn't have boundaries. As you can see above the one function is inlined into the two function, invisibly, so how big is an inlined function within another function? How many instances of a function are there in a binary? Which one do you want to monitor and know the size of, performance of, etc? Gnu has this feature with gcc, you can see if it is there with other languages or tools. Assume the answer is no, and then if you happen to find a way, then that is good.
Does the compiler saves a memory segment to be only accessed by a certain function ?
I have no idea what this means. The compiler doesn't make memory segments the linker does. How the binary is put into a memory image is a linker thing not a compiler thing for starters. Segments are just a way to communicate between tools that these bytes are for starters code (read only ideally), initialized data, or uninitialized data. Perhaps extending to read only data and then make up your own types.
If your ultimate goal is to find the bytes that represent the high level concept of "function" in memory (assuming no relocation, etc) by looking at the elf binary using the gnu toolchain. That is possible in theory.
The first thing we appear to know is that the OBJECT contains this information so that a linker feature can remove unused functions for size. But that does not automatically mean that the output binary from the linker also includes this information. You need to find where this .size lands in the object and then look for that in the final binary.
The compiler turns one language into another, often from a higher level to a lower level but not always depends on the compiler and input/output languages. C to assembly or C to machine code or what about Verilog to C++ for a simulation is that higher or lower? The terms .text, .data, .bss are not part of the language but more of a habit based on learned experience and helps as mentioned communicate with the linker so that the output binaries can be more controlled for various targets. Normally as shown above the compiler, gcc in this case, since no generalities can be made in this area across all tools and languages or even all C or C++ tools all the code for all the functions in the source file land in one .text segment by default. You have to do extra work to get something different. So the compiler in general does not make a "segment" or "memory segment" for each...In general. You already solved your problem it seems by using a command line option that turns every function into its own segment and now you have a lot more control over size and location, etc.
Just use the file format and/or the tools. This question or series of questions boils down into just go look at the elf file format. This is not a Stack Overflow question as questions seeking recommendations for external information is not for this site.
Does the compiler for example save a memory location address from 0x20 to 0x1234 to be only accessed during the execution of such basic block ? In another word, Is there a map between a function and the range of memory address used by it ?
"save"? the compiler does not link the linker links. Is that memory "only" accessed during the execution of that block? Well in a pure textbook theory yes, but in reality branch prediction and prefetch or cache line fills can also access that "memory".
Unless doing self-modifying code or using the mmu in interesting ways you do not re-use an address space for more than one function within an application. In general. So function foo() is implemented somewhere and bar() somewhere else. Hand written asm from the good old days you might have foo() branch right into the middle of bar() to save space, get better performance or to make the code harder to reverse engineer or whatever. But compilers are not that efficient, they do their best to turn concepts like functions into first off functional(ly equivalent to the high level code) and then second, if desired smaller or faster or both relative to a straight brute force conversion between languages. So barring inlining and tail (leaf?, I call it tail) optimizations and such, one could say there are some number of bytes at some address that define a compiled function. But due to the nature of processor technology you cannot assume those bytes are only accessed by the processor/chip/system busses only when executing that function.

Debugging access violation error: writing to 2071E05A0 instead of 3071E05A0

Final edit:
Some users on the silverfrost forums directed me very helpfully, to a simplification of the code and a solution.
The issue can be replicated using the following code:
PROGRAM ML14ERROR
INTEGER :: origzn, destzn
INTEGER,PARAMETER :: MXZMA = 1713, LXTZN = 1714, MXAV = 182
INTEGER,PARAMETER :: JTMPREL = 1003, av = 1
REAL(KIND=2) :: RANDOM#
REAL,dimension (1:mxav,lxtzn,lxtzn,JTMPREL:JTMPREL):: znzndaav
DO origzn=1,lxtzn
DO destzn=1,lxtzn
znzndaav(av,origzn,destzn,JTMPREL) = RANDOM#()
END DO
END DO
DO origzn=1,mxzma
DO destzn=1,mxzma
! This is where the error occurs
znzndaav(av,origzn,lxtzn,JTMPREL)=
$ znzndaav(av,origzn,lxtzn,JTMPREL)+
$ znzndaav(av,origzn,destzn,JTMPREL)
ENDDO
ENDDO
WRITE(6,*)'No errors'
END PROGRAM
The issue only arises when MXAV>182, which suggests a memory issue. Indeed, multiplying out the dimensions: 183 * 1714 * 1714 * 4 yields >2GB, exceeding the stack size.
A solution would be to use the heap as follows (Fortan 95):
PROGRAM ML14ERROR
INTEGER :: origzn, destzn
INTEGER,PARAMETER :: MXZMA = 1713, LXTZN = 1714, MXAV = 191
INTEGER,PARAMETER :: JTMPREL = 1003, av = 1
REAL(KIND=2) :: RANDOM#
REAL,allocatable :: znzndaav(:,:,:,:)
ALLOCATE( znzndaav(1:mxav,lxtzn,lxtzn,JTMPREL:JTMPREL) )
DO origzn=1,lxtzn
DO destzn=1,lxtzn
znzndaav(av,origzn,destzn,JTMPREL) = RANDOM#()
END DO
END DO
DO origzn=1,mxzma
DO destzn=1,mxzma
! This is where the error occurs
znzndaav(av,origzn,lxtzn,JTMPREL)= &
& znzndaav(av,origzn,lxtzn,JTMPREL)+ &
& znzndaav(av,origzn,destzn,JTMPREL)
ENDDO
ENDDO
DEALLOCATE(znzndaav)
WRITE(6,*)'No errors'
END PROGRAM
Once we do this, we can allocate more than 2GB and the array works fine. The program this small section of code stems from is a few years old, and we've only just now run into the issue because a model we've built is many times larger than any before. As Fortran 77 doesn't allow ALLOCATABLE arrays, we must either reduce stack usage, or port the code - or seek another optimisation.
Edited to add:
I have now put together a git repo which contains reproducible code.
Overview
I have a program that works fine when compiled to 32-bit, but presents an access violation error when compiled and run in 64-bit.
I'm using the Silverfrost Fortran compiler, FTN95 v8.51, though this issue occurs using v8.40 and v8.50.
Sample code
! .\relocmon.inc
INTEGER JTMPREL
PARAMETER(JTMPREL=1003)
REAL znda(lxtzn,JTMPREL:JTMPREL)
REAL zndaav(1:mxav,lxtzn,JTMPREL:JTMPREL)
REAL,dimension (lxtzn,lxtzn,JTMPREL:JTMPREL) :: znznda
REAL mlrlsum(lxtzn,lxtzn)
REAL,dimension (1:mxav,lxtzn,lxtzn,JTMPREL:JTMPREL):: znzndaav
COMMON /DDMON/ znda, znznda, mlrlsum,znzndaav, zndaav
! EOF .\relocmon.inc
! .\relocmon.inc with values
INTEGER JTMPREL
PARAMETER(JTMPREL=1003)
REAL znda(1714,JTMPREL:JTMPREL)
REAL zndaav(1:191,1714,JTMPREL:JTMPREL)
REAL,dimension (1714,1714,JTMPREL:JTMPREL) :: znznda
REAL mlrlsum(1714,1714)
REAL,dimension (1:191,1714,1714,JTMPREL:JTMPREL):: znzndaav
COMMON /DDMON/ znda, znznda, mlrlsum,znzndaav, zndaav
! EOF .\relocmon.inc
! .\main.for
INCLUDE 'relocmon.inc'
REAL,save,dimension(lxtzn,lxtzn,mxav) :: ddfuncval
DO origzn=1,mxzma
IF( zonedef(origzn,JZUSE) )THEN
DO destzn=1,mxzma
IF (zonedef(destzn,JZUSE)) THEN
znznda(origzn,destzn,JTMPREL)=znda(destzn,JTMPREL)*
$ ddfuncval(origzn,destzn,av)
znznda(origzn,lxtzn,JTMPREL)=znznda(origzn,lxtzn,JTMPREL)
$ +znznda(origzn,destzn,JTMPREL)
znzndaav(av,origzn,destzn,JTMPREL)=zndaav(av,destzn,JTMPREL)*
$ ddfuncval(origzn,destzn,av)
! LINE 309 -- where error occurs
znzndaav(av,origzn,lxtzn,JTMPREL)=
$ znzndaav(av,origzn,lxtzn,JTMPREL)
$ +znzndaav(av,origzn,destzn,JTMPREL)
ENDIF
ENDDO
ENDIF
ENDDO
! EOF .\main.for
NB the function zonedef simply checks that a zone is valid for the calculation we want to undertake. This function returns a logical.
Debugging
As I mentioned initially, the 32-bit compiled version of this program works fine. When attempting to run the 64-bit version, the output of the first loop is this:
from sdbg64.exe:
Error: Access Violation reading address
0x00000002071E05A0
main.for: 309
write exception to file:
Access violation (c0000005) at address 43a1f4
Within file ml14.exe
in main in line 309, at address 2b84
RAX = 0000000000000001 RBX = 000000027fff704c RCX = 000000000285e6b8 RDX = 00000002802296cc
RBP = 0000000000400000 RSI = 000000029ba3ad6c RDI = 0000000307695374 RSP = 000000000285be70
R8 = 0000000307695374 R9 = 00000002ffff5040 R10 = 000000029ba3ad6c R11 = 000000030731f0dc
R12 = 000000027fff5584 R13 = 00000002802296cc R14 = 000000028169f3ec R15 = 0000000281660928
43a1f4) addss XMM11,[85b401b4++R14]
For the rest of this... please bear with me. I'm not a trained software engineer or fortran developer by any stretch, so I'm stabbing in the dark a little to troubleshoot.
The value for ZNZNDAAV(1,337,337,1003) is 2.241640, and this is being added to ZNZNDAAV(1,337,1714,1003). This tallies with register XMM11 as detailed in the exception output. This value is at address 000000029BA3BD60. The other value is at address 00000003071E05A0.
IIUC, in relocmon.inc we're setting COMMON /DDMON/ to contain the dimensioned array znzndaav, so if the software were working nominally, the address of the value in question would be within the /DDMON/ block. The address range for /DDMON/ is z'000000027FFF6040' - z'0000000307421150'. If my logic is correct, the violation occurs outside of this block.
It appears to me that the program is attempting to write to 00000002071E05A0 when it should be using 00000003071E05A0.
Can anyone help me determine why this would be the case? There appears to be something systematic about it - could it be mere coincidence?

How to add comments to gdb output?

When debugging Windows application with Ollydbg, we can add comments to assembly language output as following:
00401020 push ebp ; add comment here
Can we add comments to gdb output just like the way above?
When we input disassemble in gdb, it shows like this:
(gdb) disassemble main
Dump of assembler code for function main:
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts#plt>
0x08048432 <+21>: mov $0x0,%eax
0x08048437 <+26>: leave
0x08048438 <+27>: ret
End of assembler dump.
Can we add some comments line 0x0804841d in order that gdb output like this:
(gdb) disassemble main
Dump of assembler code for function main:
0x0804841d <+0>: push %ebp ; add comment here
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts#plt>
0x08048432 <+21>: mov $0x0,%eax
0x08048437 <+26>: leave
0x08048438 <+27>: ret
End of assembler dump.
Yes, GDB commands can be commented with the #.
00401020 push ebp ; # add comment here
http://www.chemie.fu-berlin.de/chemnet/use/info/gdb/gdb_16.html
Can we add some comments
No.
Obviously you can save GDB output into a text file and add comments there to your heart's content. But GDB will not display them next time you disas main.

MIPS Assembly - How to clear a bit

I am working on an assignment for a class, and the instructor has been about as clear as mud about how we are supposed to "clear a bit."
The assignement is:
"clr0 takes its parameter value, clears bit zero in that value, and returns the result. e.g. 1011 becomes 1010"
I have tried:
clr0:
andi $v0, $a0, 0
jr $ra
But the value does not set the 0th bit to 0.
What am I doing wrong?
li $t0, 1 # $t0 = 0x00000001
not $t0, $t0 # $t0 = 0xfffffffe
and $v0, $a0, $t0 # clear bit 0 of $v0
The first two instructions set all bits in $t0 to 1, except bit 0. And-ing a bit with 1 leaves that bit unchanged. And-ing a bit with 0 sets that bit to 0. Thus all bits of $a0 are moved to $v0 unchanged, except for bit 0 which is set to 0.
Let's look at what and does. From Wikipedia:
and $d,$s,$t $d = $s & $t
So $v in your code will be set to $a0 ANDed with 0. Well, anything ANDed with 0 is 0. So if you have 32 bit registers, wouldn't you want something like
clr0: lui $a1, 65535 # start constructing the constant, so upper bits of v1 are all 1
ori $a1, $a1, 65534 # finish constructing the constant, so that only 0th bit is 0
and $v0, $a0, $a1 #mask out the 0th bit of a0
jr $ra
That would mean that all of the bits, except for the 0th bit, are left as is, and the 0th bit is forced to 0.
I just edited this to take into account that andi only takes a 16 bit constant - so first we construct the constant in a free register, then AND it. I don't have a compiler handy, nor do I remember which MIPS registers are free for the function to party on, but something very similar to this should get the job done.
And here's a small driver program to call clr0
main: li $a0, 5
and $v0, $v0, 0
jal clr0 #after this, investigate the value of $v0.
jr $ra # retrun to caller
#Here is the program
#A Program to clear bit. Suppose we have 11 = 1011 (in binary) &
#when we apply bit manipulation for a clearing last bit,
#it changes from 1011 to 1010 #that is 10(decimal)
#To do this we need a mask.
.data
.text
main:
li $v0, 5
syscall
move $a0, $v0
jal bitmanipulate
#when bitmanipulate function ends, the $v0 still have
#the result of the bitmanipulate.
move $s0, $v0
li $v0, 1
move $a0, $s0
syscall
li $v0, 10 #terminating the program
syscall
bitmanipulate:
#making mask, following two line is the making of mask.
addi $s0, $zero, -1 #s0 = -1, so it will be
#presented as ...11111111 (32 times 1, if presenting number in 32 bit)
sll $s0, $s0, 1 #now $s0 = ...11111110,
#shifting one bit to left
and $v0, $a0, $s0 #
#$v0 store the result of the function
jr $ra