Trace32 CMM script : understanding the Data.Set command - trace32

What does the following command mean?
sYmbol.NEW _VectorTable 0x34000000
sYmbol.NEW _ResetVector 0x34000020
sYmbol.NEW _InitialSP 0x34000100
Data.Set EAXI:_VectorTable %Long _InitialSP _ResetVector+1

The command Data.Set writes data values to your target's memory. The syntax of the command is
Data.Set <address>|<address_range> [<access_width>] {value(s)}
The <address> to which the data is written to has the form:
<access_class>:<address_offset>
A full address, just the address offset and the values (you want to write), can also be represented by debug symbols. These symbols are usually the variables, function names and labels defined in your target application and are declared to the debugger, by loading the target application's ELF file.
In this case however the symbols are declared in the debugger manually by the command sYmbol.NEW.
Anyway: By replacing the symbols with their value in the command Data.Set EAXI:_VectorTable %Long _InitialSP _ResetVector+1 we get the command
Data.Set EAXI:0x34000000 %Long 0x34000100 0x34000021
So what does this command actually do?
The access-width specifier %Long indicate that 32-bit values should be written. As a result the address will increment automatically by 4 for each specified data value.
The value 0x34000100 is written to address EAXI:0x34000000
The value 0x34000021 is written to address EAXI:0x34000004
The <access_class> "EAXI" indicates that the debugger should access the address 0x34000000 directly via the AXI bus (Advanced eXtensible Interface). By writing directly to the AXI bus, you bypass your target's CPU core (bypassing any MMU, MPU or caches). The leading 'E' of the access class EAXI indicates that the write operation may also performed while the CPU core is running (or considered to be running (e.g. in Prepare mode)). The description of all possible access classes is specific to the target's core-architecture and thus, you can find the description in the debugger's "Target Architecture Manual".
And what does this exactly mean for your target and the application running on it?
Well, I don't know you chip or SoC (nor do I know your application).
But from the data I see, I guess that you are debugging a chip with an ARM architecture - probably Cortex-M. Your chip's Boot-ROM seems to start at address 0x34000000, while your actual application's start-up code starts at 0x34000020 (maybe with symbol _start).
For Cortex-M cores you have to program at offset 0 of your vector table (in the boot ROM) the initial value of the stack-pointer, while at offset 4 you have to write the initial value of the program counter. In your case the program counter should be initialized with 0x34000021. Why 0x34000021 and not 0x34000020? Because your start-up code is probably encoded in ARM Thumb. (Cortex-M can only execute Thumb code). By setting the least significant bit of the initial value for the program counters to 1, the core knows, that it should start decoding Thumb instructions. (Not setting the least significant bit to 1 on a Cortex-M will cause an exception).

Related

How to execute separate compiled binary file from inside program on MCU?

I have an MCU (say an STM32) running, and I would like to 'pass' it a separately compiled binary file over UART/USB and use it like calling a function, where I can pass it data and collect its output? After its complete, a second, different binary would be sent to be executed, and so on.
How can I do this? Does this require an OS be running? I'd like to avoid that overhead.
Thanks!
It is somewhat specific to the mcu what the exact call function is but you are just making a function call. You can try the function pointer thing but that has been known to fail with thumb (on gcc)(stm32 uses the thumb instruction set from arm).
First off you need to decide in your overall system design if you want to use a specific address for this code. for example 0x20001000. or do you want to have several of these resident at the same time and want to load them at any one of multiple possible addresses? This will determine how you link this code. Is this code standalone? with its own variables or does it want to know how to call functions in other code? All of this determines how you build this code. The easiest, at least to first try this out, is a fixed address. Build like you build your normal application but based in a ram address like 0x20001000. Then you load the program sent to you at that address.
In any case the normal way to "call" a function in thumb (say an stm32). Is the bl or blx instruction. But normally in this situation you would use bx but to make it a call need a return address. The way arm/thumb works is that for bx and other related instructions the lsbit determines the mode you switch/stay in when branching. Lsbit set is thumb lsbit clear is arm. This is all documented in the arm documentation which completely covers your question BTW, not sure why you are asking...
Gcc and I assume llvm struggles to get this right and then some users know enough to be dangerous and do the worst thing of ADDing one (rather than ORRing one) or even attempting to put the one there. Sometimes putting the one there helps the compiler (this is if you try to do the function pointer approach and hope the compiler does all the work for you *myfun = 0x10000 kind of thing). But it has been shown on this site that you can make subtle changes to the code or depending on the exact situation the compiler will get it right or wrong and without looking at the code you have to help with the orr one thing. As with most things when you need an exact instruction, just do this in asm (not inline please, use real) yourself, make your life 10000 times easier...and your code significantly more reliable.
So here is my trivial solution, extremely reliable, port the asm to your assembly language.
.thumb
.thumb_func
.globl HOP
HOP:
bx r0
I C it looks like this
void HOP ( unsigned int );
Now if you loaded to address 0x20001000 then after loading there
HOP(0x20001000|1);
Or you can
.thumb
.thumb_func
.globl HOP
HOP:
orr r0,#1
bx r0
Then
HOP(0x20001000);
The compiler generates a bl to hop which means the return path is covered.
If you want to send say a parameter...
.thumb
.thumb_func
.globl HOP
HOP:
orr r1,#1
bx r1
void HOP ( unsigned int, unsigned int );
HOP(myparameter,0x20001000);
Easy and extremely reliable, compiler cannot mess this up.
If you need to have functions and global variables between the main app and the downloaded app, then there are a few solutions and they involve resolving addresses, if the loaded app and the main app are not linked at the same time (doing a copy and jump and single link is generally painful and should be avoided, but...) then like any shared library you need to have a mechanism for resolving addresses. If this downloaded code has several functions and global variables and/or your main app has several functions and global variables that the downloaded library needs, then you have to solve this. Essentially one side has to have a table of addresses in a way that both sides agree on the format, could be as a simple array of addresses and both sides know which address is which simply from position. Or you create a list of addresses with labels and then you have to search through the list matching up names to addresses for all the things you need to resolve. You could for example use the above to have a setup function that you pass an array/structure to (structures across compile domains is of course a very bad thing). That function then sets up all the local function pointers and variable pointers to the main app so that subsequent functions in this downloaded library can call the functions in the main app. And/or vice versa this first function can pass back an array structure of all the things in the library.
Alternatively a known offset in the downloaded library there could be an array/structure for example the first words/bytes of that downloaded library. Providing one or the other or both, that the main app can find all the function addresses and variables and/or the caller can be given the main applications function addresses and variables so that when one calls the other it all works... This of course means function pointers and variable pointers in both directions for all of this to work. Think about how .so or .dlls work in linux or windows, you have to replicate that yourself.
Or you go the path of linking at the same time, then the downloaded code has to have been built along with the code being run, which is probably not desirable, but some folks do this, or they do this to load code from flash to ram for various reasons. but that is a way to resolve all the addresses at build time. then part of the binary in the build you extract separately from the final binary and then pass it around later.
If you do not want a fixed address, then you need to build the downloaded binary as position independent, and you should link that with .text and .bss and .data at the same address.
MEMORY
{
hello : ORIGIN = 0x20001000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > hello
.rodata : { *(.rodata*) } > hello
.bss : { *(.bss*) } > hello
.data : { *(.data*) } > hello
}
you should obviously do this anyway, but with position independent then you have it all packed in along with the GOT (might need a .got entry but I think it knows to use .data). Note, if you put .data after .bss with gnu at least and insure, even if it is a bogus variable you do not use, make sure you have one .data then .bss is zero padded and allocated for you, no need to set it up in a bootstrap.
If you build for position independence then you can load it almost anywhere, clearly on arm/thumb at least on a word boundary.
In general for other instruction sets the function pointer thing works just fine. In ALL cases you simply look at the documentation for the processor and see the instruction(s) used for calling and returning or branching and simply use that instruction, be it by having the compiler do it or forcing the right instruction so that you do not have it fail down the road in a re-compile (and have a very painful debug). arm and mips have 16 bit modes that require specific instructions or solutions for switching modes. x86 has different modes 32 bit and 64 bit and ways to switch modes, but normally you do not need to mess with this for something like this. msp430, pic, avr, these should be just a function pointer thing in C should work fine. In general do the function pointer thing then see what the compiler generates and compare that to the processor documentation. (compare it to a non-function pointer call).
If you do not know these basic C concepts of function pointer, linking a bare metal app on an mcu/processor, bootstrap, .text, .data, etc. You need to go learn all that.
The times you decide to switch to an operating system are....if you need a filesystem, networking, or a few things like this where you just do not want to do that yourself. Now sure there is lwip for networking and some embedded filesystem libraries. And multithreading then an os as well, but if all you want to do is generate a branch/jump/call instruction you do not need an operating system for that. Just generate the call/branch/whatever.
Loading and execution a fully linked binary and loading and calling a single function (and returning to the caller) are not really the same thing. The latter is somewhat complicated and involves "dynamic linking", where the code effectively and secures in the same execution environment as the caller.
Loading a complete stand-alone executable in the other hand is more straightforward and is the function of a bootloader. A bootloader loads and jumps to the loaded executable which then establishes it's own execution environment. Returning to the bootloader requires a processor reset.
In this case it would make sense to have the bootloader load and execute code in RAM if you are going to be frequently loading different code. However be aware that on Harvard Architecture devices like STM32, RAM execution may slow down execution because data and instruction fetch share the same bus.
The actual implementation of a bootloader will depend on the target architecture, but for Cortex-M devices is fairly straightforward and dealt with elsewhere.
STM32 actually includes an on-chip bootloader (you need to configure the boot source pins to invoke it), which I believe can load and execute code in RAM. It is normally used to load a secondary bootloader to load and program flash, but it can be used for loading any code.
You do need to build and link your code to run from RAM at the address tle loader locates it, or if supported build position-indeoendent code that can run from anywhere.

Trace32 practice script: DATA.SET how to use

What does the following command mean? What does EA mean?
&HEAD=0x146BF94C
DATA.SET EA:&HEAD+0x4 %LONG DATA.LONG(EA:&HEAD+0x4)&0xFFFFFF
The command Data.Set writes raw data to your target's memory at the given address.
The command follows this schema:
  Data.Set <address> <access width> <data>
where
<address> has the form <access class> : <address offset>where the "access class" are several letters specifying which memory is accessed in which way.
<access width> is %Byte for 8-bit, %Word for 16-bit, %Long for 32-bit or %Quad for 64-bit
<data> is the data you actually want to write.
For the "access class" check the chapter Access Classes in your Processor Architecture Manual (menu → Help → Processor Architecture Manual). The types of available access classes vary from the used processor architecture. (e.g. different classes for ARM and PowerPC)
The "access class" EA: means:
Access the memory while the CPU is running (E).
Access the memory via absolute (physical) memory addresses (A) bypassing the MMU.
Finally the data (<data>) you want to write to the memory can be a fixed value (e.g. 0x42) or calculated via an expression (0x40+0x02). Such an expression can also use so called "PRACTICE functions". The function used in your example is Data.Long(<address>), which reads 32-bit from the given address.
(Note: Expressions may not contain blanks.)
And then you have a macro &HEAD= which contains the string "0x146BF94C". This means that any &HEAD appearing in any later command gets replaces by the content of the macro. This similar to the C-Preprossor.
Thus, your commands
&HEAD=0x146BF94C
DATA.SET EA:&HEAD+0x4 %LONG DATA.LONG(EA:&HEAD+0x4)&0xFFFFFF
have the same meaning than
Data.Set EA:0x146BF950 %LONG Data.Long(EA:0x146BF950)&0x00FFFFFF
and that defines actually a read-modify-write on the 32-bit value at address EA:0x146BF950: The value is read from memory, the upper 8-bit are set to zero and than the result gets written back to the same memory location.
It has (almost) the same meaning than the C code expression
*((volatile uint32_t*) 0x146BF950) &= 0x00FFFFFF;
It is just "almost the same" because the C code expression would not bypass the MMU, like your Data.Set command does, thanks to the "A" in the memory access class of the addresses.

What's the meaning of HIGHLOW in a disassembled binary file?

I just used DUMPBIN for the first time and I see the term HIGHLOW repeatedly in the output file:
BASE RELOCATIONS #7
11000 RVA, E0 SizeOfBlock
...
3B5 HIGHLOW 2001753D ___onexitbegin
3C1 HIGHLOW 2001753D ___onexitbegin
...
I'm curious what this term stands for. I didn't find anything on Google or Stackoverflow about it.
To apply a fixup, a delta is calculated as the difference between the
preferred base address, and the base where the image is actually
loaded.
The basic idea is that when doing a fixup at some address, we must know
what memory must be changed ("offset" field)
what value is needed for its relocation ("delta" value)
which parts of relocated data and delta value to use ("type" field)
Here are some possible values of the "type" field
HIGH - add higher word (16 bits) of delta to the 16-bit value at "offset"
LOW - add lower word of delta to the value at "offset"
HIGHLOW - add full delta to the 32-bit value at "offset"
In other words, HIGHLOW type tells the program that it's doing a fix-up on offset "offset" from the page of this relocation block*, and that there is a doubleword that needs to be modified in order to have properly working executable.
* all of the relocation entries are grouped into blocks, and every block has a page on which its entries are applied
Let's say that you have this instruction in your code:
section .data
message: "Hello World!", 0
section .code
...
mov eax, message
...
You run assembler and immediately after it you run disassembler. Now your code looks like this:
mov eax, dword [0x702000]
You're now curious why is it 0x700000, and when you look into file dump, you see that
ImageBase: 0x00700000
Now you understand where did this number come from and you'e ready to run the executable.
Loader which loads executable files into memory and creates address space for them finds out, that memory 0x700000 is unavailable and it needs to place that file somewhere else. It decides that 0xf00000 will be OK and copies the file contents there.
But, your program was linked to work only with data on 0x700000 and there was no way for linker to know that its output would be relocated. Because of this, loader must do its magic. It
calculates delta value - the old address (image base) is 0x700000 but it wants 0xf00000 (preferred address). It subtracts one from another and gets 0x800000 as result.
gets to the .reloc section of the file
checks if there is still another page (4KB of data) to be relocated. If no, it continues toward calling file´s entry point.
4.for every relocation for the current page, it
gets data at relocation offset
adds the delta value (in the way as type field states)
places the new value at relocation offset
continues on step 3
There are also more types of relocation entry and some of them are architecture-specific. To see a full list, read the "Microsoft Portable Executable and Common Object File Format, section 6.6.2. Fixup Types".
What you see here is the content of the "Base relocation table" in Microsoft Windows executable files.
Base relocation tables are necessary in Windows for DLL files and they are optional for executable files; they contain information about the location of address information in the EXE/DLL file that must be updated when the actual address of the DLL file in memory is known (when loading the DLL into memory). Windows uses the information stored in this table to update the address information.
The table supports different types of addresses while the naming is Microsoft-specific: ABSOLUTE (= dummy), HIGH, LOW, HIGHLOW, HIGHADJ and MIPS_JMPADDR.
The full name of the constant is "IMAGE_REL_BASED_HIGHLOW".
The "ABSOLUTE" type is typically a dummy entry inserted to ensure the parts of the table are a multiple of 4 (or 8) bytes long.
On x86 CPUs only the "HIGHLOW" type is used: It tells Windows about the location of an absolute (32-bit) address in the file.
Some background info:
In your example the "Image Base" could be 0x20000000 which means that the EXE/DLL file has been compiled to be loaded into address 0x20000000. At the addresses 0x200113B5 (0x20000000 + 0x11000 + 0x3B5) and 0x200113C1 there are absolute addresses.
Let's say the memory at location 0x200113B5 contains the value 0x20012345 which is the address of a function or variable in the program.
Maybe the memory at address 0x20000000 cannot be used and Windows decides to load the DLL into the memory at 0x50000000 instead. Then the 0x20012345 must be replaced by 0x50012345.
The information in the base relocation table is used by Windows to find all addresses that must be replaced.

Why this branch instruction of ARM doesn't work

Now I am writing a library to mock the trivial function for C/C++. It is used like this: MOCK(mocked, substitute)
If you call the mocked function, the substitute function will be called instead.
I modify the attribute of code page and inject the jump code into the function to implement it. I have implemented it for x86 CPU and I want to port it to ARM CPU. But I have a problem when I inject binary code.
For example, the address of substitute function is 0x91f1, and the address of function to mock is 0x91d1. So I want to inject the ARM branch code into 0x91d1 to jump to the substitute function.
According to the document online, the relative address is
(0x91f1 - (0x91d1 + 8)) / 4 = 6
so the binary instruction is:
0xea000006
Because my arm emulator(I use Android arm v7 emulator) is little endian, so the binary code to inject is:
0x060000ea
But when I executed the mocked function after injecting branch code, segment fault occurred. I don't know why the branch instruction is wrong. I have not learned ARM architecture so I don't know whether the branch instruction of ARM has some limits.
Addresses you are branching to is odd numbered, meaning they are in Thumb mode.
There is an obvious problem with your approach.
If target is in Thumb mode, you either need to be in Thumb mode at the point you are branching from or you need to use a bx (Branch and Exchange) instruction.
Your functions are in Thumb mode (+1 at the target) but you are using ARM mode branch coding (B A1 coding?), so obviously either you are not in Thumb mode or you are using ARM mode instruction in Thumb mode.
The ARM family allows loading of registers with values. One of those registers is the PC (Program Counter).
Some alternatives:
You could use a function to load the PC register with the
destination address (absolute).
Add the PC register with an offset.
Use a multiply-and-add instruction with the PC register.
Push the destination register onto the stack and pop into PC
register.
These choices plus modifying the destination of the branch instructions are all different options at are not "best". Pick one that suits you best and is easiest to maintain.

What does a dangerous relocation error mean?

I am getting a linking error:
dangerous relocation: l32r: Literal placed after use:
I am still trying to debug; however, I want to better understand this error. I understand what relocation is; however, I am not sure how it can be dangerous and was looking for some clarification. Also, a small code snippet that could generate this type of error would be helpful.
In short, what is "a dangerous relocation"?
This is a two-part answer, as there are really two questions here, one general ("what's a dangerous relocation?") and one specific to the Xtensa ("why can't you have a literal placed after where it's used in the code?").
What's all this dangerous relocation stuff about, anyway?
To understand what a 'dangerous relocation' is, we must first understand what a relocation is. As a compiler is generating an object file from some piece of code, it will need to reference symbols that are defined somewhere else: perhaps in another object file in the link, or perhaps in a shared library. However, the compiler does not know the addresses of external symbols when compiling a given object file. It must emit a relocation to serve as a named placeholder, telling the linker "OK, shove the address of foobar into this spot, and oh, you have to do X, Y, and Z to it to make it fit into the instructions there."
Most of the time, this works without a hitch, you get a binary out of your linker, and Bob's your uncle. When this process breaks down, and the linker cannot make the address of the symbol the compiler gave it fit into the instructions at the site of the relocation, it gives up and tosses out a 'dangerous relocation' message (among others -- the all-too-common 'relocation truncated to fit' pops out of this process as well) to inform the programmer that something has gone terribly wrong.
What's wrong with a literal placed after where it's used?
Now that we know what a generic 'dangerous relocation' is, we can move on to the second half of the error message, namely "l32r: Literal placed after use". The Xtensa uses an instruction known as L32R to load constant values from memory that don't fit into the Xtensa's MOVI immediate load instruction, which has a 12-bit signed immediate field. The L32R instruction is described in the Xtensa ISA reference as follows:
L32R is a PC-relative 32-bit load from memory. It is typically used to load constant
values into a register when the constant cannot be encoded in a MOVI instruction.
L32R forms a virtual address by adding the 16-bit one-extended constant value encoded
in the instruction word shifted left by two to the address of the L32R
plus three with the two least significant bits cleared. Therefore, the offset can always
specify 32-bit aligned addresses from -262141 to -4 bytes from the address of the L32R
instruction. 32 bits (four bytes) are read from the physical address. This data is then
written to address register at.
Given the restrictions on L32R quoted above, the error message breaks down quite nicely: the compiler generated a L32R to load a constant (which could be a value or an address) somewhere in your code, but either the constant's value was not available to the compiler (think extern const), or the address needed to be filled in by the linker (this is the likely case). So, it emitted this L32R relocation to tell the linker to 'fill in the blank' in the L32R instruction with the address of a constant value or constant address somewhere in your program. However, the linker couldn't find anywhere in the previous 256KB of code -- or literal pool, depending on how your compiler and Xtensa core are configured -- to shove a constant, so it gave up and spat out the error message you asked about.
How does one fix this?
Unfortunately, a 'dangerous relocation' of this sort depends on code size, so unless you have a bona fide compiler or linker bug on your hands, reproducing it with a small snippet of code will be impossible. There are two possible causes you can try to address, though.
There's no room for my literal pool!
If you are compiling with -mno-text-section-literals (which is the default), the linker gets fed the literal pools as separate sections which it then has to interleave with the code sections. If you have a particularly large object file in your link, it may have over 256KB of code in its .text section, leaving nowhere in the range of a L32R instruction for the linker to place the associated literal pool section at. Compiling with -mtext-section-literals should eliminate the error; if it does not work, you have that flag on already, or if you are using -ffunction-sections (which places each function into its own section; it is sometimes used in embedded work to allow the linker to throw out unused code), read on.
The linker (or assembler) still can't find a place to put my literals!
When the compiler and assembler are told to emit literals into the text section, they restrict placement of the literal pools to before the functions that use them (i.e. before the ENTRY instruction of the function) in order to minimize the risk that the literal pools will be executed as code, with obviously bad results. If you have an extremely long function in your code -- I shudder to think what sort of function could generate more than 256KB of code -- the 'default' literal pool placed before the ENTRY instruction can wind up out of range of L32R instructions near the end of the function. Normally, the compiler will emit an assembler directive known as .literal_position, as well as a jump around the mid-function literal pool, to provide the assembler and linker with an extra place to shove literals into. You can tell the compiler to output an assembler listing using -save-temps and then search it for .literal_position directives; if one isn't present in a function that has L32R instructions past the 256KB mark, congratulations! You just found a compiler bug!
What else could happen to produce this?
The only other circumstance I see that can provoke such a problem is if there is nowhere before the ENTRY instruction that the compiler or linker can put a literal pool, and the compiler can't figure this out on its own -- this can occur with interrupt handlers, or functions that are explicitly placed at the beginning of a physical memory boundary by the linker script. In this case, you will need to insert the .literal_position directive and its associated jump & label by hand in an asm statement at the top of the culprit function in order to provide the assembler with a place to put the culprit function's literals. As the GAS manual puts it:
The assembler will automatically place text section literal pools before ENTRY
instructions, so the .literal_position directive is only needed to specify some other
location for a literal pool. You may need to add an explicit jump instruction to skip
over an inline literal pool.
For example, an interrupt vector does not begin with an ENTRY instruction so the
assembler will be unable to automatically find a good place to put a literal pool.
Moreover, the code for the interrupt vector must be at a specific starting address, so
the literal pool cannot come before the start of the code. The literal pool for the
vector must be explicitly positioned in the middle of the vector (before any uses of the
literals, due to the negative offsets used by PC-relative L32R instructions).
Wait, I'm using the absolute literal option!
If you have the LITBASE option enabled in your Xtensa core and are getting this error, this is a sign that your literal pool has overflowed. The compiler should generate the 'glue' needed to switch literal pools in this case, though: if it doesn't, congratulations! You have just found a compiler bug!
Here's http://www.mail-archive.com/mspgcc-users#lists.sourceforge.net/msg11488.html
This might be helpful for you.
Good luck :)