Reading memory in debugging sessions with Eclipse embedded C/C++ - gdb

I am trying to run the following assembly program using the :
.syntax unified
.thumb
.text
.global main
.balign 4
.thumb_func
.type main, %function
main: MOV r0, #28 #1st argument;
MOV r1, #21 # 2nd argument;
ADD r1, r1, r0
LDR r2, =adder
STR r1, [r2]
stop: B stop
.data
adder: .word -1
.end
I am using the gdb QEMU armeclipse debugger. The file compiles successfully, and r1, r0 are correct as I step over them. r2 has a value of 0x186A0 and the memory view (in the traditional rendering) shows that its value is 0xFFFFFFFF (as expected). However upon stepping to the 'stop' label, the memory value of r2 did not change. An explanation to this end is appreciated.
Note: I reran the program for the STM32F4xx project template, to see if 'adder' is not in flash memory at runtime. It turns out that the debugger did not write into the RAM memory, the address of the 'adder' in this case: 0x8001140, which is out of the RAM memory (0x20000000). Could someone explain what could have happened? This strange behavior is there even if I explicitly state .section .data instead of just .data.

Related

How are assembly directives instructed?

To elaborate the question on the title, suppose I declared the following array in C++,
int myarr[10];
This disassembles to the following in x86
myarr:
.zero 40
Now, AFAIK this .zero directive is used for convention and is not an instruction. Then, how exactly is this directive translated to x86(or any other architecture, it's not the emphasis here) instructions? Because, for all we know the CPU can only execute instructions. So I guess these directives are somehow translated to instructions, am I correct?
I could generalize the question by also asking how .word .long etc. are translated into instructions, but I think it is clear.
The output of the assembler is an object module. In the object module are representations of various sections for a program. Each section has a size, some attributes, and possibly some data to be put into the section.
For example, a section may be a few thousand bytes, have attributes indicating it contains instructions for execution, and have data that consists of those instructions. Another section might be several hundred bytes but have no data—it is just space to be allocated when the program starts. Another section might be very big and have non-zero data that contains its initial values when the program starts.
To assemble a .zero 40 directive, the compiler just includes forty bytes of zeros in the section it is currently building. When it writes the final output, it will include those zeros in that section. Data directives like this and .word and such simply tell the assembler what data to put into its output.
unsigned int stuff[10];
void fun ( void )
{
unsigned int r;
for(r=0;r<10;r++) stuff[r]=r;
}
using ARM...
00000000 <fun>:
0: e3a03000 mov r3, #0
4: e59f2010 ldr r2, [pc, #16] ; 1c <fun+0x1c>
8: e5a23004 str r3, [r2, #4]!
c: e2833001 add r3, r3, #1
10: e353000a cmp r3, #10
14: 1afffffb bne 8 <fun+0x8>
18: e12fff1e bx lr
1c: 00000ffc
Disassembly of section .bss:
00001000 <stuff>:
...
The array stuff is simply data it is not code it is not instructions and won't be, the directive in question you asked about won't become code, it cants it is data.
If you want to see code, instructions, then you need to put lines of high level language that act on data for example as shown here. And in that case the compiler generates code.
Looking at this compilers actual output (comments and other non-essentials removed)
fun:
mov r3, #0
ldr r2, .L6
.L2:
str r3, [r2, #4]!
add r3, r3, #1
cmp r3, #10
bne .L2
bx lr
.L7:
.align 2
.L6:
.word stuff-4
...
.comm stuff,40,4
the .comm in this case is how they declared the data that represents the array in the high level language. and the other stuff is mostly code. the .align is there so that the address of L6 is aligned so that you don't get an alignment fault when you try to read it.
.word is a directive, what you see here is .text vs .data while it is just one simple C program with the array and the code right there next to each other. because code can possibly live in read only memory like flash and data needs to be in read/write memory and at compile time the compiler doesn't know where the data is relative to the code, so it generates an abstraction by placing a read only word in the code that the linker fills in later, the code is generic and whatever the linker puts in there it uses. The linker "places" .text and .bss in this case it wasn't initialized so it isn't actually .data and then makes that connection in the code.
labels are directives if you will so that the programmer or code generator (compiler) doesn't have to count instructions or overall size of instructions to make relative jumps. Let the tools do that for you.
1c: 00000ffc
Disassembly of section .bss:
00001000 <stuff>:
...
and based on the way I linked this (non actually a working) program stuff is the only data item in this program and the linker placed it where I asked at address 0x1000, then went back and filled in that .word directive to be stuff-4 which is 0xFFC so that the code as compiled works.
directives are not part of the instruction set but are part of the assembly language, note that assembly language is defined by the assembler, the tool, not the instruction set/target. There are countless different x86 assembly languages and AT&T vs Intel is not the primary difference, the directives how you define a label, how you indicate a number is hex or decimal, because of the vagueness of the instructions as defined in the early docs lots of adjectives if you will to be able to specify which mov instruction you were actually after and even though that's part of the instruction and not a directive those adjectives varied across assembly languages. ARM, MIPS, and many if not most others have had tools created with incompatible assembly languages. .zero for example being one of those incompatible things.
In any case an assembly language in question needs to be able to define data and then have a way for code to reference that data in order to make useful programs.
The notion of a one to one line of assembly language to instructions is very misleading and don't get fooled by it, today's compilers generate almost as much non-code as code in their output. Lots of directives and other information.

Cortex-M4 custom HardFault_Handler

I'm in the process of writing a custom HardFault_Handler for the Cortex M4 -- but for an unknown reason, I am unable to step though any instructions in the handler.
When I break with gdb, I am stuck at the first instruction of the handler. n does not proceed to the next instruction, gdb just starts spinning again until I break. OpenOCD shows that I am hitting halting repeatedly, but it doesn't appear that any of the code in my exception handler is being executed... but every time I break, I am in the exception handler.
...
Info : halted: PC: 0x08000240
Info : halted: PC: 0x08000240
Info : halted: PC: 0x08000240
Info : halted: PC: 0x08000240
...
I know that if I break at main, I can step through the code (in C) up until the point I generate the exception and catch it in my custom handler. However, stepping through the instructions in the handler just puts be at the beginning of the handler.
Here's my handler:
.syntax unified
.thumb
.global HardFault_Handler
.section .text.HardFault_Handler,"ax",%progbits
HardFault_Handler:
.size HardFault_Handler, .-HardFault_Handler
Infinite_Loop:
mov r0, #0x1
mov r1, #0x2
mov r2, #0x3
b Infinite_Loop
.thumb needs to be replaced by .thumb_func.
This directive ensures that the function pointer is a valid for the thumb mode. Thumb mode instructions have odd addresses, while arm mode instructions have even addresses. The directive will add 1 to the address of the function pointer (i.e. exception vector pointer).
Without this directive, the address of the pointer will be in arm mode (even address), which itself causes an exception. In other words, my exception was being preempted with an exception.
Add .thumb_func to explicitly identify this symbol as a thumb-mode function so the linker can do the right thing.
.syntax unified
.thumb_func
.global HardFault_Handler
.section .text.HardFault_Handler,"ax",%progbits
HardFault_Handler:
.size HardFault_Handler, .-HardFault_Handler
Infinite_Loop:
mov r0, #0x1
mov r1, #0x2
mov r2, #0x3
b Infinite_Loop

Instruction disassembly for ARM

I just setup a raspberry pi machine and tried reverse engineering the following piece of code.
#include<stdio.h>
int main() {
printf("this is a test\n");
}
For the most part the following disassembly in gdb seemed to make sense.
0x000083c8 <+0>: push {r11, lr}
0x000083cc <+4>: add r11, sp, #4
0x000083d0 <+8>: ldr r0, [pc, #8] ; 0x83e0 <main+24>
0x000083d4 <+12>: bl 0x82ec <puts>
0x000083d8 <+16>: mov r0, r3
0x000083dc <+20>: pop {r11, pc}
0x000083e0 <+24>: andeq r8, r0, r4, asr r4
However, I fail to understand why the instruction at 0x000083e0 exists. Is that instruction even a part of the main function? Wouldn't the value that was pushed in at 0x000083c8 be popped out into pc, immediately transferring control over to some other location?
Also I tried setting a breakpoint at 0x000083e0 -- I seem to be getting a very strange SEGFAULT. Why would that be?
When this function is called (i.e. when execution begins at instruction 0x000083c8), the link register (LR) should already contain the return address. Fast-forward to 0x000083d8: the puts function's return result is placed in R0 in accordance with the ARM C calling convention (link, link). Then, the return address is popped from the stack into the PC - effectively ending execution of this function. This implies that the instruction at 0x000083e0 is not a part of your program, and your inspection should be limited to instructions 0x000083c8 through 0x000083dc.
So to answer your questions:
Correct.
The "instruction" at 0x000083e0 is essentially junk. You may not even have execution and/or access privileges to this memory depending on the specifics of your ARM core (Does it have an MMU, etc?). Thus, a seg fault is a reasonable outcome when attempting to inspect that location.
EDIT: in agreement with comments below, the contents of 0x000083e0 should be interpreted as data, not instructions.
Four bytes at 0x000083e0 isn't junk. It is part of the PC relative load at
0x000083d0 <+8>: ldr r0, [pc, #8] ; 0x83e0 <main+24>
It is also visible in the comment as ; 0x83e0 <main+24>.
Problem here since you need to pass address of a string to puts, whose address might change during linking step, compiler needs to create suitable code for such further processing. Thus address of string ends up in instruction stream yet outside of any execution context.

How to tell clang not to save registers to stack?

The Goal
I'm currently trying out avr-llvm (a llvm that supports AVR as a target). My main goal is to use it's hopefully better optimizer (compared to the one of gcc) to achieve smaller binaries. If you know a little about AVRs you know that you've got only few memory.
I currently work with an ATTiny45, 4KB Flash and 256 Bytes (just bytes not KB!) of SRAM.
The Problem
I was trying compile a simple C program (see below), to check what assembly code is produced and how the machine-code size is developing. I used "clang -Oz -S test.c" to produce assembly output and to optimize it for minimal size. My problem are the needlessly saved register values, knowing that this method would never return.
My Questions...
How can I tell llvm that it can just clobber any register, if needed without saving/restoring it's content? Any ideas how to optimize it even more (e.g. more efficient setup of stack)?
Details / Example
Here is my test program. As mentioned above it was compiled using "clang -Oz -S test.c".
#include <stdint.h>
void __attribute__ ((noreturn)) main() {
volatile uint8_t res = 1;
while (1) {}
}
As you can see it has just one "volatile" variable of type uint8_t (if I don't set it to volatile everything would be optimized out). This variable is set to 1. And there is an endless loop at the end. Now let us have a look at the assembly output:
.file "test.c"
.text
.globl main
.align 2
.type main,#function
main:
push r28
push r29
in r28, 61
in r29, 62
sbiw r29:r28, 1
in r0, 63
cli
out 62, r29
out 63, r0
out 61, r28
ldi r24, 1
std Y+1, r24
.BB0_1:
rjmp .BB0_1
.tmp0:
.size main, .tmp0-main
Yeah! That's a lot of machine code for such a simple program. I just tested some variations and had a look into the reference manual of the AVR... so I can explain what happens. Let's have a look at each part.
This here is the "beef", which is just doing what our c program is about. It loads r24 with value "1" which is stored into memory at Y+1 (Stack Pointer + 1). And there is of course our endless loop:
ldi r24, 1
std Y+1, r24
.BB0_1:
rjmp .BB0_1
Note: that the endless loop is needed. Else the __attribute__ ((noreturn)) is ignored and the stack pointer + saved registers are restored later.
Just before that the pointer in "Y" is set up:
in r28, 61
in r29, 62
sbiw r29:r28, 1
in r0, 63
cli
out 62, r29
out 63, r0
out 61, r28
What happens here is:
Y (register pair r28:r29 is equivalent to "Y") is loaded from ports 61 and 62, these ports map to some "registers" namely SPL and SPH ("L"ow and "H"igh byte of the "S"tack "P"ointer)
the loaded value is decremented (sbiw r29:r28)
the changed value of the stack pointer is saved back to the ports; and I guess to avoid problems: interrupts are disabled before; the state of "cli/sti" [which is stored in register 63 (SREG)] is saved to r0 and later restored to port 63.
This setup of the stack registers seems to be inefficient. To increment the stack pointer I would just need to "push r0" to the stack. Then I could just load the value of SPH/SPL into r29:r28. How ever, this would probably need some changes to llvm's optimizer in source code. The above code makes just sense if more than 3 byte of stack have to be reserved for local variables (even if optimizing -O3, for -Oz it makes sense for up to 6 bytes). HOW EVER... I guess we need to touch the source of llvm for that; so this is out of scope.
More interesting is this part:
push r28
push r29
As main() is not intended to return, this doesn't make sense. This just wastes RAM and flash memory for silly instructions (remember: we have only 64, 128 or 256 bytes SRAM available in some devices).
I investigated this a bit further: If we let main return (e.g. no endless loop) the stack pointer is restored, we have a "ret" instruction at the end AND the registers r28 and r29 are restored from stack via "pop r29, pop 28". But the compiler should know, that if scope of the function "main" is never left, then all registers can be clobbered without having them stored to the stack.
This problem seems just a bit "silly" as we speak about 2 bytes RAM. But just think about what happens if the program starts using the rest of the registers.
All this really changed my view at current "compilers". I thought today there wouldn't be much room for optimization via assembler. But it seems there is...
So, still the question is...
Do you have any idea how to improve this situation (except for filing a bug report / feature request)?
I mean: Are there just some compiler switches I might have overlooked...?
Additional Info
Using __attribute__ ((OS_main)) works for avr-gcc.
Output is as following:
.file "test.c"
__SREG__ = 0x3f
__SP_H__ = 0x3e
__SP_L__ = 0x3d
__CCP__ = 0x34
__tmp_reg__ = 0
__zero_reg__ = 1
.global __do_copy_data
.global __do_clear_bss
.text
.global main
.type main, #function
main:
push __tmp_reg__
in r28,__SP_L__
in r29,__SP_H__
/* prologue: function */
/* frame size = 1 */
ldi r24,lo8(1)
std Y+1,r24
.L2:
rjmp .L2
.size main, .-main
This is (to my opinion) optimal in size (6 instructions or 12 bytes) and also in speed for this sample program. Is there any equivalent attribute for llvm? (clang version '3.2 (trunk 160228) (based on LLVM 3.2svn)' does neither know about OS_task nor knows anything about OS_main).
The answer to the question asked is somewhat brought up by Anton in his comment: the problem is not in LLVM, it is in your AVR target. For example, here is an equivalent program run through Clang and LLVM for other targets:
% cat test.c
__attribute__((noreturn)) int main() {
volatile unsigned char res = 1;
while (1) {}
}
% ./bin/clang -c -o - -S -Oz test.c # I'm on an x86-64 machine
<snip>
main: # #main
.cfi_startproc
# BB#0: # %entry
movb $1, -1(%rsp)
.LBB0_1: # %while.body
# =>This Inner Loop Header: Depth=1
jmp .LBB0_1
.Ltmp0:
.size main, .Ltmp0-main
.cfi_endproc
% ./bin/clang -c -o - --target=armv6-unknown-linux-gnueabi -S -Oz test.c
<snip>
main:
sub sp, sp, #4
mov r0, #1
strb r0, [sp, #3]
.LBB0_1:
b .LBB0_1
.Ltmp0:
.size main, .Ltmp0-main
% ./bin/clang -c -o - --target=powerpc64-unknown-linux-gnu -S -Oz test.c
<snip>
main:
.align 3
.quad .L.main
.quad .TOC.#tocbase
.quad 0
.text
.L.main:
li 3, 1
stb 3, -9(1)
.LBB0_1:
b .LBB0_1
.long 0
.quad 0
.Ltmp0:
.size main, .Ltmp0-.L.main
As you can see for all three of these targets, the only code generated is to reserve stack space (if necessary, it isn't on x86-64) and set the value on the stack. I think this is minimal.
That said, if you do find problems with LLVM's optimizer, the best way to get help is to send email to the development mailing list or to file bugs if you have a specific input IR sequence that should produce more minimal output IR.
Finally, to answer the questions asked in comments on your question: there are actually areas where LLVM's optimizer is significantly more powerful than GCC. However, there are also areas where it is significantly less powerful. =] Benchmark the code you care about.

Strange behaviour of ldr [pc, #value]

I was debugging some c++ code (WinCE 6 on ARM platform),
and i find some behavior strange:
4277220C mov r3, #0x93, 30
42772210 str r3, [sp]
42772214 ldr r3, [pc, #0x69C]
42772218 ldr r2, [pc, #0x694]
4277221C mov r1, #0
42772220 ldr r0, [pc, #0x688]
Line 42772214 ldr r3, [pc, #0x69C] is used to get some constant from .DATA section, at least I think so.
What is strange that according to the code r2 should be filled with memory from address pc=0x42772214 + 0x69C = 0x427728B0, but according to the memory contents it's loaded from 0x427728B8 (8bytes+), it happens for other ldr usages too.
Is it fault of the debugger or my understanding of ldr/pc?
Another issue I don't get - why access to the .data section is relative to the executed code? I find it little bit strange.
And one more issue: i cannot find syntax of the 1st mov command (any one could point me a optype specification for the Thumb (1C2))
Sorry for the laic description, but I'm just familiarizing with the assemblies.
This is correct. When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.
From the ARM-ARM:
When an instruction reads the PC, the value read depends on which instruction set it comes from:
For an ARM instruction, the value read is the address of the instruction plus 8 bytes. Bits [1:0] of this value are always zero, because ARM instructions are always word-aligned.
For a Thumb instruction, the value read is the address of the instruction plus 4 bytes. Bit [0] of this value is always zero, because Thumb instructions are always halfword-aligned.
This way of reading the PC is primarily used for quick, position-independent addressing of nearby instructions and data, including position-independent branching within a program.
There are 2 reasons for pc-relative addressing.
Position-independent code, which is in your case.
Get some complicated constants nearby which cannot be written in 1 simple instruction, e.g. mov r3, #0x12345678 is impossible to complete in 1 instruction, so the compiler may put this constant in the end of the function and use e.g. ldr r3, [pc, #0x50] to load it instead.
I don't know what mov r3, #0x93, 30 means. Probably it is mov r3, #0x93, rol 30 (which gives 0xC0000024)?