AWS Neoverse N1 r3p1: MOVSNE pc, lr = Illegal instruction unconditionally

AWS Neoverse N1 r3p1: MOVSNE pc, lr = Illegal instruction unconditionally - amazon-web-services

On a Neoverse N1 r3p1 in a AWS t4g.nano instance, in 32-bit user mode the following pieces of code result an illegal instruction exception:
teq pc, pc
movsne pc, lr
and
teq pc, pc
ldmfdne sp!, {pc}
However from reading the current Arm Architecture Reference Manual it should work, firstly teq pc, pc should set the Z flag in 32-bit modes (and clear Z in 26-bit modes if one of PSR bits is set), the pseudo-code for MOVS states:
if ConditionPassed() then
EncodingSpecificOperations();
(shifted, carry) = Shift_C(R[m], shift_t, shift_n, PSTATE.C);
result = shifted;
if d == 15 then
if setflags then
ALUExceptionReturn(result);
else
ALUWritePC(result);
else
// else branch snipped
ALUExceptionReturn(result) is CONSTRAINED UNPREDICTABLE, but control flow shouldn't reach there.
Is my understanding wrong or is the CPU broken?
Is it safe to replace the offending instruction in an exception handler (a Linux SIGILL handler) without stopping other threads?
A complete test program for Linux:
.syntax unified
.global _start
_start:
teq pc, pc
movsne pc, lr
mov r7, #1
svc 0
The intent is to return restoring the PSR in 26-bit modes, in order to comply with an old ABI on sufficiently old hardware.

Related

Debugging a nasty SIGILL crash: Text Segment corruption

Ours is a PowerPC based embedded system running Linux. We are encountering a random SIGILL crash which is seen for wide variety of applications. The root-cause for the crash is zeroing out of the instruction to be executed. This indicates corruption of the text segment residing in memory. As the text segment is loaded read-only, the application cannot corrupt it. So I am suspecting some common sub-system (DMA?) causing this corruption. Since the problem takes days to reproduce (crash due to SIGILL) it is getting difficult to investigate. So to begin with I want to be able to know if and when the text segment of any application has been corrupted.
I have looked at the stack trace and all the pointers, registers are proper.
Do you guys have any suggestions how I can go about it?
Some Info:
Linux 3.12.19-rt30 #1 SMP Fri Mar 11 01:31:24 IST 2016 ppc64 GNU/Linux
(gdb) bt
0 0x10457dc0 in xxx
Disassembly output:
=> 0x10457dc0 <+80>: mr r1,r11
0x10457dc4 <+84>: blr
Instruction expected at address 0x10457dc0: 0x7d615b78
Instruction found after catching SIGILL 0x10457dc0: 0x00000000
(gdb) maintenance info sections
0x10006c60->0x106cecac at 0x00006c60: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
Expected (from the application binary):
(gdb) x /32 0x10457da0
0x10457da0 : 0x913e0000 0x4bff4f5d 0x397f0020 0x800b0004
0x10457db0 : 0x83abfff4 0x83cbfff8 0x7c0803a6 0x83ebfffc
0x10457dc0 : 0x7d615b78 0x4e800020 0x7c7d1b78 0x7fc3f378
0x10457dd0 : 0x4bcd8be5 0x7fa3eb78 0x4857e109 0x9421fff0
Actual (after handling SIGILL and dumping nearby memory locations):
Faulting instruction address: 0x10457dc0
0x10457da0 : 0x913E0000
0x10457db0 : 0x83ABFFF4
=> 0x10457dc0 : 0x00000000
0x10457dd0 : 0x4BCD8BE5
0x10457de0 : 0x93E1000C
Edit:
One lead that we have is that the corruption is always occurring at an offset that ends with 0xdc0.
For e.g.
Faulting instruction address: 0x10653dc0 << printed by our application after catching SIGILL
Faulting instruction address: 0x1000ddc0 << printed by our application after catching SIGILL
flash_erase[8557]: unhandled signal 4 at 0fed6dc0 nip 0fed6dc0 lr 0fed6dac code 30001
nandwrite[8561]: unhandled signal 4 at 0fed6dc0 nip 0fed6dc0 lr 0fed6dac code 30001
awk[4448]: unhandled signal 4 at 0fe09dc0 nip 0fe09dc0 lr 0fe09dbc code 30001
awk[16002]: unhandled signal 4 at 0fe09dc0 nip 0fe09dc0 lr 0fe09dbc code 30001
getStats[20670]: unhandled signal 4 at 0fecfdc0 nip 0fecfdc0 lr 0fecfdbc code 30001
expr[27923]: unhandled signal 4 at 0fe74dc0 nip 0fe74dc0 lr 0fe74dc0 code 30001
Edit 2: Another lead is that the corruption is always occurring at physical frame number 0x00a4d. I suppose with PAGE_SIZE of 4096 this translates to physical address of 0x00A4DDC0. We are suspecting couple of our kernel drivers and investigating further. Is there any better idea (like putting hardware watchpoint) which could be more efficient? How about KASAN as suggested below?
Any help is appreciated. Thanks.

1.) Text segment is RO, but the permissions could be changed by mprotect, you can check that if you think it is possible
2.) If it is kernel problem:
Run kernel with KASAN and KUBSAN (undefined behaviour) sanitizers
Focus on drivers code not included in mainline
The hint here is one byte corruption. Maybe i'm wrong, but it means that DMA is not to blame. It looks like some kind of invalid store.
3.) Hardware. I think, your problem looks like a hardware problem (RAM issue).
You can try to decrease RAM system frequency in bootloader
Check if this problem reproduces on stable mainline software, that is how you can prove that it's it

find where the interrupt happened on cortex-m4

I am trying to find where in my code a specific interrupt happened. In this case it is on a stm32f4 microcontroller and the interrupt is the SysTick_Handler.
What i want is basically to figure out from where the systick interrupt happened. I am using arm-none-eabi-gdb to try to find the backtrace, but the only information i am getting from there is:
(gdb) bt
#0 SysTick_Handler () at modules/profiling.c:66
#1 <signal handler called>
#2 0x55555554 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
How can I get some information about where the program was before the interrupt fired?
Looking at the arm documentation here, it seems I should be able to read the stack pointer, and get the PC from there. But then this is exactly what the unwinder in GDB is doing isnt it?

You were on the right track at the end of your question. The ARM Cortex-M cores have two stack pointers, the main stack pointer (MSP, used for interrupts) and the process stack pointer (PSP, used for tasks).
When an interrupt with priority comes in, the current register values (for most of the registers) are pushed onto the current stack (PSP if interrupting the background application, or MSP if interrupting a lower priority interrupt), and then the stack is switched to the MSP (if not already there).
When you first enter an interrupt, the link register (LR, return address) will have a value that is mostly F's rather than an actual return address. This value tells the core how to exit when branched to. Typically, you'll see a value of 0xFFFFFFFD if the background task was interrupted, or 0xFFFFFFF1 if a lower priority interrupt was interrupted. These values will differ if you are using the floating point unit. The magic in this value, though, is that bit 2 (0x4) tells you whether your stack frame is on the PSP or MSP.
Once you determine which stack your frame is on, you can find the address you were executing from by looking at the appropriate stack pointer minus 24 (6 32-bit locations). See Figure 2.3 in your link. This will point you to the PC from which you were interrupted.

As many of you commented, the PC would be in two different stacks, the way I solved it was by actually finding a HardFault_Handling code in assembly and taking what i needed from there. To get the PC value correctly I am using the following code.
register int *r0 __asm("r0");
__asm( "TST lr, #4\n"
"ITE EQ\n"
"MRSEQ r0, MSP\n"
"MRSNE r0, PSP\n" // stack pointer now in r0
"ldr r0, [r0, #0x18]\n" // stored pc now in r0
//"add r0, r0, #6\n" // address to stored pc now in r0
);
The value of where the interrupt happended can now be accessed by
uint32_t PC = *r0;
and can now be used for whatever I want it. Unfortunately I did not manage to get GDB to unwind the stack automatically for me. But at least I found out where the interrupt was firing, which was the goal.

We keep seeing this question in various forms and folks keep saying there are two stacks. So I tried it myself with the systick.
The documentation says that we are in thread mode out of reset, and if you halt with openocd it says that
target halted due to debug-request, current mode: Thread
I have some code to dump registers:
20000000 APSR
00000000 IPSR
00000000 EPSR
00000000 CONTROL
00000000 SP_PROCESS
20000D00 SP_PROCESS after I modified it
20000FF0 SP_MAIN
20000FF0 mov r0,sp
then I dump the stack up to 0x20001000 which is where I know my stack started
20000FF0 00000000
20000FF4 00000000
20000FF8 00000000
20000FFC 0100005F
I setup and wait for a systick interrupt, the handler dumps registers and ram and then goes into an infinite loop. bad practice in general but just debugging/learning here. Before the interrupt I prep some registers:
.thumb_func
.globl iwait
iwait:
mov r0,#1
mov r1,#2
mov r2,#3
mov r3,#4
mov r4,#13
mov r12,r4
mov r4,#15
mov r14,r4
b .
and in the handler I see
20000000 APSR
0000000F IPSR
00000000 EPSR
00000000 CONTROL
20000D00 SP_PROCESS
20000FC0 SP_MAIN
20000FC0 mov r0,sp
20000FC0 0000000F
20000FC4 20000FFF
20000FC8 00000000
20000FCC FFFFFFF9 this is our special lr (not one rjp mentioned)
20000FD0 00000001 this is r0
20000FD4 00000002 this is r1
20000FD8 00000003 this is r2
20000FDC 00000004 this is r3
20000FE0 0000000D this is r12
20000FE4 0000000F this is r14/lr
20000FE8 01000074 and this is where we were interrupted from
20000FEC 21000000 this is probably the xpsr mentioned
20000FF0 00000000 stuff that was there before
20000FF4 00000000
20000FF8 00000000
20000FFC 0100005F
01000064 <iwait>:
1000064: 2001 movs r0, #1
1000066: 2102 movs r1, #2
1000068: 2203 movs r2, #3
100006a: 2304 movs r3, #4
100006c: 240d movs r4, #13
100006e: 46a4 mov ip, r4
1000070: 240f movs r4, #15
1000072: 46a6 mov lr, r4
1000074: e7fe b.n 1000074 <iwait+0x10>
1000076: bf00 nop
So in this case, straight out of the ARM documentation, it is not using the sp_process it is using sp_main. It is pushing the items the manual says it is pushing including the interrupted/return address which is 0x1000074.
Now, if I set the SPSEL bit (be careful to set the PSP first), it appears that a mov r0,sp in application/thread mode uses the PSP not MSP. But then the handler uses msp for a mov r0,sp but appears to put the
before in thread/foreground
20000000 APSR
00000000 IPSR
00000000 EPSR
00000000 SP_PROCESS
20000D00 SP_PROCESS modified
00000000 CONTROL
00000002 CONTROL modified
20000FF0 SP_MAIN
20000D00 mov r0,sp
now in the handler
20000000 APSR
0000000F IPSR
00000000 EPSR
00000000 CONTROL (interesting!)
20000CE0 SP_PROCESS
20000FE0 SP_MAIN
20000FE0 mov r0,sp
dump of that stack
20000FE0 0000000F
20000FE4 20000CFF
20000FE8 00000000
20000FEC FFFFFFFD
20000FF0 00000000
20000FF4 00000000
20000FF8 00000000
20000FFC 0100005F
dump of sp_process stack
20000CE0 00000001
20000CE4 00000002
20000CE8 00000003
20000CEC 00000004
20000CF0 0000000D
20000CF4 0000000F
20000CF8 01000074 our return value
20000CFC 21000000
So to be in this position of dealing with the alternate stack that folks keep mentioning, you have to put yourself in that position (or some code you rely on). Why you would want to do that for simple bare metal programs, who knows, the control register of all zeros is nice and easy, can share one stack just fine.
I dont use gdb, but you need to get it to dump all the registers sp_process and sp_main then depending on what you find, then dump a dozen or so words at each and in there you should see the 0xFFFFFFFx as a marker then count down from that to see the return address. You can have your handler read the two stack pointers as well then you can look at gprs. With gnu assembler mrs rX,psp; mrs rX,msp; For the process and main stack pointers.

This is called DEBUGGING. The easiest way to get started is to just stick a bunch of printf() calls here and there throughout the code. Run the program. If it prints out:
got to point A
got to point B
got to point C
and dies, then you know it died between "C" and "D." You can now refine that downwards by festooning the code between "C" and "D" with more closely spaced printf() calls.
This is the best way for a beginner to get started. Many seasoned experts also prefer printf() for debugging. Debuggers can get in the way.

Need information about using Inline Assembly for WinCE, ARM9

I am not very good in inline assembly, but planning to use it for optimization purpose in an Embedded project. As I don't know much of the information about it, I am in need of some help.
I am having Win CE 6.0, with ARM9, using MS Visual Studio 2005 (using MFC).
Basically, I want to make memory access faster, and do some bitwise operations.
It would be really helpful for me if I can get any online link, or some examples of using registers, variable names, pointers (some memory transfer and bitwise operations related stuff) etc for my particular environment.
EDIT after ctacke's answer:
It would be really helpful for me if there is any link or small examples to work out with .s files, specifically writing and exporting functions from .s, and steps involving in combining them with my MFC application. Any small example would do it.
Thank You.
Kind Regards,
Aftab

The ARM compilers that ship with Visual Studio (all versions) do not support inline ASM - only the x86 compilers support inline ASM. To use ASM for ARM (or SH or MIPS as well) you have to create a separate code file (typically a .s file), export functions from your ASM and call those.
EDIT
Here's a simple example (taken from here):
AREA asm_func, CODE, READONLY
; Export my_asm function location so that C compiler can find it and link
EXPORT my_asm
my_asm
;
; ARM Assembly language function to set LED1 bit to a value passed from C
; LED1 gets value (passed from C compiler in R0)
; LED1 is on GPIO port 1 bit 18
; See Chapter 9 in the LPC1768 User Manual
; for all of the GPIO register info and addresses
; Pinnames.h has the mbed modules pin port and bit connections
;
; Load GPIO Port 1 base address in register R1
LDR R1, =0x2009C020 ; 0x2009C020 = GPIO port 1 base address
; Move bit mask in register R2 for bit 18 only
MOV.W R2, #0x040000 ; 0x040000 = 1<<18 all "0"s with a "1" in bit 18
; value passed from C compiler code is in R0 - compare to a "0"
CMP R0, #0 ; value == 0 ?
; (If-Then-Else) on next two instructions using equal cond from the zero flag
ITE EQ
; STORE if EQ - clear led 1 port bit using GPIO FIOCLR register and mask
STREQ R2, [R1,#0x1C] ; if==0, clear LED1 bit
; STORE if NE - set led 1 port bit using GPIO FIOSET register and mask
STRNE R2, [R1,#0x18] ; if==1, set LED1 bit
; Return to C using link register (Branch indirect using LR - a return)
BX LR
END

How to print register values in GDB?

How do I print the value of %eax and %ebp?
(gdb) p $eax
$1 = void

info registers shows all the registers; info registers eax shows just the register eax. The command can be abbreviated as i r

If you're trying to print a specific register in GDB, you have to omit the % sign. For example,
info registers eip
If your executable is 64 bit, the registers start with r. Starting them with e is not valid.
info registers rip
Those can be abbreviated to:
i r rip

There is also:
info all-registers
Then you can get the register name you are interested in -- very useful for finding platform-specific registers (like NEON Q... on ARM).

If only want check it once, info registers show registers.
If only want watch one register, for example, display $esp continue display esp registers in gdb command line.
If want watch all registers, layout regs continue show registers, with TUI mode.

Gdb commands:
i r <register_name>: print a single register, e.g i r rax, i r eax
i r <register_name_1> <register_name_2> ...: print multiple registers, e.g i r rdi rsi,
i r: print all register except floating point & vector register (xmm, ymm, zmm).
i r a: print all register, include floating point & vector register (xmm, ymm, zmm).
i r f: print all FPU floating registers (st0-7 and a few other f*)
Other register groups besides a (all) and f (float) can be found with:
maint print reggroups
as documented at: https://sourceware.org/gdb/current/onlinedocs/gdb/Registers.html#Registers
Tips:
xmm0 ~ xmm15, are 128 bits, almost every modern machine has it, they are released in 1999.
ymm0 ~ ymm15, are 256 bits, new machine usually have it, they are released in 2011.
zmm0 ~ zmm31, are 512 bits, normal pc probably don't have it (as the year 2016), they are released in 2013, and mainly used in servers so far.
Only one serial of xmm / ymm / zmm will be shown, because they are the same registers in different mode. On my machine ymm is shown.

p $eax works as of GDB 7.7.1
Tested as of GDB 7.7.1, the command you've tried works:
set $eax = 0
p $eax
# $1 = 0
set $eax = 1
p $eax
# $2 = 1
This syntax can also be used to select between different union members e.g. for ARM floating point registers that can be either floating point or integers:
p $s0.f
p $s0.u
From the docs:
Any name preceded by ‘$’ can be used for a convenience variable, unless it is one of the predefined machine-specific register names.
and:
You can refer to machine register contents, in expressions, as variables with names starting with ‘$’. The names of registers are different for each machine; use info registers to see the names used on your machine.
But I haven't had much luck with control registers so far: OSDev 2012 http://f.osdev.org/viewtopic.php?f=1&t=25968 || 2005 feature request https://www.sourceware.org/ml/gdb/2005-03/msg00158.html || alt.lang.asm 2013 https://groups.google.com/forum/#!topic/alt.lang.asm/JC7YS3Wu31I
ARM floating point registers
See: https://reverseengineering.stackexchange.com/questions/8992/floating-point-registers-on-arm/20623#20623

Easiest for me is:
(gdb) x/x $eax
First x stands for examine and second x is hex. You can see other formats using:
(gdb) help x
You can easily print strings with x/s $eax or return addresses with x/a $ebp+4.

Why would cortex-m3 reset to address 0 in gdb?

I am building a cross-compile toolchain for the Stellaris LM3S8962 cortex-m3 chip. The test c++ application I have written will execute for some time then fault. The fault will occur when I try to access a memory-mapped hardware device. At the moment my working hypothesis is that I am missing some essential chip initialization in my startup sequence.
What I would like to understand is why would the execution in gdb get halted and the program counter be set to 0? I have the vector table at 0x0, but the first value is the stack pointer. Shouldn't I end up in one of the fault handlers I specify in the vector table?
(gdb)
187 UARTSend((unsigned char *)secret, 2);
(gdb) cont
Continuing.
lm3s.cpu -- clearing lockup after double fault
Program received signal SIGINT, Interrupt.
0x00000000 in g_pfnVectors ()
(gdb) info registers
r0 0x1 1
r1 0x32 50
r2 0xffffffff 4294967295
r3 0x0 0
r4 0x74518808 1951500296
r5 0xc24c0551 3259762001
r6 0x42052dac 1107635628
r7 0x20007230 536900144
r8 0xf85444a9 4166272169
r9 0xc450591b 3293600027
r10 0xd8812546 3632342342
r11 0xb8420815 3091335189
r12 0x3 3
sp 0x200071f0 0x200071f0
lr 0xfffffff1 4294967281
pc 0x1 0x1 <g_pfnVectors+1>
fps 0x0 0
cpsr 0x60000023 1610612771
The toolchain is based on gcc, gdb, openocd.

GDB happily gave you some clue:
clearing lockup after double fault
Your CPU was in locked state. That means it could not run its "Hard Fault" Interrupt Handler (maybe there is a 0 in its Vector).
I usually get these when I forgot to "power" the periperial, the resulting Bus Error escalates first to "Hard Fault" and then to locked state. Should be mentioned in the manual of your MCU, btw.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS Neoverse N1 r3p1: MOVSNE pc, lr = Illegal instruction unconditionally - amazon-web-services

Related

Debugging a nasty SIGILL crash: Text Segment corruption

find where the interrupt happened on cortex-m4

Need information about using Inline Assembly for WinCE, ARM9

How to print register values in GDB?

Why would cortex-m3 reset to address 0 in gdb?

Categories

Resources