Software interrupt exception or undefined instruction exception? - gdb

I am running a program on a bare-metal ARM (v5TE-compliant) with a JTAG connector and gdb. The program runs from some SDRAM in supervisor mode, and uses only arm instructions.
At some point an exception occurs. Stopping gdb with ctrl+C I can see that the CPSR indicates an undefined exception mode, however the program counter indicates a software interrupt exception (0xffff0008). According to the ARM ARM, when an undefined instruction exception occurs, the PC_und should be 0xffff0004 or 0x00000004. What's happening to my program, did a SWI happen or an undefined instruction exception?
edit to make my question clearer:
My program purpose is to test the hardware of the custom board. When there is a hardware problem, there can be a corruption from the program in RAM (as can be seen below) which is the cause of the exception generated. When the hardware is normal the test software runs without problem. My RAM addresses range from 0 to 0x40000000, the program is loaded between 0x1000 and 0x2000. The supervisor mode stack pointer is set to 0xff0. The interruption vector consists only of breakpoints.
(gdb) c
Continuing.
^C^C
Program received signal SIGTRAP, Trace/breakpoint trap.
0xffff0008 in ?? ()
Registers from the undefined exception mode:
(gdb) i r
r0 0x52878 338040
r1 0x2020000 33685504
r2 0x2020000 33685504
r3 0x2020000 33685504
r4 0x2020000 33685504
r5 0x2020000 33685504
r6 0x2020000 33685504
r7 0x2020000 33685504
r8 0x2020000 33685504
r9 0x2020000 33685504
r10 0x2020000 33685504
r11 0x2020000 33685504
r12 0x2020000 33685504
sp 0x2020000 0x2020000
lr 0xffff0008 4294901768
pc 0xffff0008 0xffff0008
fps 0x0 0
cpsr 0x800000db 2147483867
Registers from the supervisor mode:
(gdb) set $cpsr=0xd3
(gdb) i r
r0 0x52878 338040
r1 0x2020000 33685504
r2 0x2020000 33685504
r3 0x2020000 33685504
r4 0x2020000 33685504
r5 0x2020000 33685504
r6 0x2020000 33685504
r7 0x2020000 33685504
r8 0x2020000 33685504
r9 0x2020000 33685504
r10 0x2020000 33685504
r11 0x2020000 33685504
r12 0x2020000 33685504
sp 0xff3ffffe 0xff3ffffe
lr 0x1020 4128
pc 0xffff0008 0xffff0008
fps 0x0 0
cpsr 0xd3 211
Here is the (corrupted)program in RAM around the address pointed by the supervisor link register:
(gdb) x/5i 0x1020-8
0x1018 <_start+24>: bic r0, r0, #135168 ; 0x21000
0x101c <_start+28>: strbcs r0, [r0], #1025
0x1020 <_start+32>: mcr 15, 0, r0, cr1, cr0, {0}
0x1024 <_start+36>: ldr r1, [pc, #120] ; 0x10a4 <skip_intreg_reset+100>
0x1028 <_start+40>: ldr r2, [r1, #8]
(gdb) x/4w 0x1018
0x1018 <_start+24>: 0xe3c00a01
0x101C <_start+28>: 0xfec00401
0x1020 <_start+32>: 0xee010f10
0x1024 <_start+36>: 0xe59f1078
dump from the program object file:
18: e3c00a01 bic r0, r0, #4096 ; 0x1000
1c: e3c00001 bic r0, r0, #1 ; 0x1
20: ee010f10 mcr 15, 0, r0, cr1, cr0, {0}
24: e59f1078 ldr r1, [pc, #120] ; a4 <skip_intreg_reset+0x64>
28: e5912000 ldr r2, [r1]

This is a community wiki answer.
The issue was caused by two different problems:
The wrong vector table was being initialized. The ARM has selectable high and low vectors and high 0xffff0000 was the default, whereas the code was initialized as if the vector table was at 0x00000000. The high vector table contained the following instructions (infinite loops on exceptions):
0xffff0000: b 0xffff0020
0xffff0004: b 0xffff0004
0xffff0008: b 0xffff0008
0xffff000c: b 0xffff000c
0xffff0010: b 0xffff0010
0xffff0014: b 0xffff0014
0xffff0018: b 0xffff0018
0xffff001c: b 0xffff001c
The SDRAM issues on the board caused the program content in RAM to be corrupted and to generate undefined exceptions. Following that the program stopped responding as it was in an infinite loop and the OP stopped gdb. The JTAG debugger used (peedi) actually jumps to the next instruction when gdb is stopped with ctrl+C, that's why the pc was 0xffff0008 even though the cpsr indicated an undefined exception situated at 0xffff0004.

Related

Does armclang saves all needed register on stack with attribute("IRQ")?

I'm working with Keil ARMCompiler 6.15 (armclang.exe) and I'm in doubt of the correctness of the generated assembler code.
It seems to me that the attribute 'interrupt("IRQ")' is ignored.
For me r1 and r2 should be saved on the stack, too.
When I remove the attribute 'used' my complete function is removed (optimization).
Can anyone see the mistake I made or what I've forgotten?
Originally the code was created for gcc.
Attributes used for interrupt routines:
#define INTERRUPT_PROCEDURE __attribute__((interrupt("IRQ"),used,section(".IsrSection")))
#define ISR_VARIABLE __attribute__((section(".IsrSection")))
#define FAST_SHARED_DATA __attribute__((section(".FastSharedDataSection")))
C++ Code:
uint64_t volatile FAST_SHARED_DATA systick_value = uint64_t(0);
extern "C" {
void INTERRUPT_PROCEDURE SysTick_Handler()
{
systick_value++;
}
}
Assembler Code:
0x08001280 push {r4, r6, r7, lr}
0x08001282 add r7, sp, #8
0x08001284 mov r4, sp
0x08001286 bfc r4, #0, #3
0x0800128a mov sp, r4
0x0800128c movw r0, #8192 ; 0x2000
0x08001290 movt r0, #8192 ; 0x2000
0x08001294 ldrd r1, r2, [r0]
0x08001298 adds r1, #1
0x0800129a adc.w r2, r2, #0
0x0800129e strd r1, r2, [r0]
0x080012a2 sub.w r4, r7, #8
0x080012a6 mov sp, r4
0x080012a8 pop {r4, r6, r7, pc}
0x080012aa movs r0, r0
0x080012ac movs r0, r0
0x080012ae movs r0, r0
You do not need this attribute. It is needed in very rare circumstances when the stack is not aligned to 8 bytes (STKALGN bit is not set) by the hardware and you are going to use functions with 64 bits parameters (like uint64_t). ARM automatically saves R0-R3 + some others registers on the stack when entering the ISR handler. If you use FPU you may want to enable FPU registers stackup as well.

c++ Segmentation fault when do list.push_back(), correct on the host, error on the arm

Program received signal SIGSEGV, Segmentation fault.
0x400741e0 in std::_List_node_base::hook(std::_List_node_base*) ()
from /mnt/yaffs2/Cdatabox/lib/libstdc++.so.6
(gdb) bt
#0 0x400741e0 in std::_List_node_base::hook(std::_List_node_base*) ()
from /mnt/yaffs2/Cdatabox/lib/libstdc++.so.6
#1 0x00012df8 in std::list<std::list<Cbox::SteadyNode, std::allocator<Cbox::SteadyNode> >, std::allocator<std::list<Cbox::SteadyNode, std::allocator<Cbox::SteadyNode> > > >::_M_insert (this=0xbe9d1af0, __position=..., __x=...)
at /opt/arm-2008q3-linux/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.2/../../../../arm-none-linux-gnueabi/include/c++/4.3.2/bits/stl_list.h:1342
#2 0x00012e30 in std::list<std::list<Cbox::SteadyNode, std::allocator<Cbox::SteadyNode> >, std::allocator<std::list<Cbox::SteadyNode, std::allocator<Cbox::SteadyNode> > > >::push_back (this=0xbe9d1af0, __x=...)
at /opt/arm-2008q3-linux/bin/../lib/gcc/arm-none-linux-gnueabi/4.3.2/../../../../arm-none-linux-gnueabi/include/c++/4.3.2/bits/stl_list.h:876
#3 0x0000d508 in Cbox::SteadyAnalysis::__dealSteady (this=0xbe9d1a98)
at ../include/class/SteadyAnalysis.h:237
#4 0x0000dc7c in Cbox::SteadyAnalysis::input (this=0xbe9d1a98,
weight=1467031, rawTime=1552067705)
at ../include/class/SteadyAnalysis.h:110
#5 0x0000deb4 in main (argc=2, argv=0xbe9d1d74) at SteadyAnalysis.cc:30
(gdb) disassemble
Dump of assembler code for function _ZNSt15_List_node_base4hookEPS_:
0x400741d0 <+0>: ldr r3, [r1, #4]
0x400741d4 <+4>: stm r0, {r1, r3}
0x400741d8 <+8>: ldr r2, [r1, #4]
0x400741dc <+12>: str r0, [r1, #4]
=> 0x400741e0 <+16>: str r0, [r2]
0x400741e4 <+20>: bx lr
End of assembler dump.
(gdb) i r
r0 0x31220 201248
r1 0xbe9d1af0 3197967088
r2 0x46 70
r3 0x46 70
r4 0xbe9d1a98 3197967000
r5 0x40dd4c00 1088244736
r6 0x4136629f 1094083231
r7 0x1e400000 507510784
r8 0x41d720ab 1104617643
r9 0x0 0
r10 0x31d20 204064
r11 0xbe9d1944 3197966660
r12 0x30a58 199256
sp 0xbe9d1928 0xbe9d1928
lr 0x12df8 77304
pc 0x400741e0 0x400741e0 <std::_List_node_base::hook(std::_List_node_base*)+16>
cpsr 0x60000010 1610612752
(gdb)
code:
//Type declaration
struct SteadyNode {
double mean;
int duration;
int startLine;
time_t startDetectedRawTime;
time_t endDetectedRawTime;
};
//Definition info
//list<SteadyNode>::iterator upIt, downIt;
//list<SteadyNode> steadyNodeList;
//list<list<SteadyNode> > sleepPiceList;
{
steadyNodeList.back().endDetectedRawTime = currentRawTime;
SteadyNode last = steadyNodeList.back();
if (onBedFlag == 1)
{
downIt = steadyNodeList.end();
list<SteadyNode> onBedMeanList;
onBedMeanList.splice(onBedMeanList.begin(), steadyNodeList, upIt, downIt);
steadyNodeList.clear();
steadyNodeList.push_back(last);
sleepPiceList.push_back(onBedMeanList); //<=== crash position
onBedMeanList.clear();
onBedFlag = -1;
}
else
{
steadyNodeList.clear();
steadyNodeList.push_back(last);
}
}
There is only one source code.
When I compiled successfully on the debian9 host, valgrind --leak-check=full tested no memory leaks and the program executed correctly.
On the arm platform, the compilation was successful, but the program got this error when it was executed. I hope to get everyone's help, thank you.
Solved.
Those gdb print info is not major problem.
For gcc4.4 and gcc4.3, STL list will not pre alloc element when you write code like list<DiyClass>, but in gcc6.3 will. That is why the same code can run well in gcc6.3 but segment fault in gcc4.3.
I guess some low level implements is different between diff gcc version. if someone know detail reason, give a answer, thanks.

arm vector table pointing one byte after

I have small application that compiles and runs well on my ARM Cortex M4. But when I disassemble binary file, that I flush, here is how first bytes look like:
00000000 <.data>:
0: 20020000 andcs r0, r2, r0
4: 080003b5 stmdaeq r0, {r0, r2, r4, r5, r7, r8, r9}
8: 08000345 stmdaeq r0, {r0, r2, r6, r8, r9}
c: 08000351 stmdaeq r0, {r0, r4, r6, r8, r9}
080003b5 should be the address of Reset handler (I have .word Reset_Handler there), but disassembling ELF shows that Reset handler is actually located at 080003b4, which is 1 byte before:
080003b4 <Reset_Handler>:
80003b4: 2100 movs r1, #0
80003b6: e003 b.n 80003c0 <InitData>
(It's running in THUMB mode, I have 2byte instructions).
Even if I disassemble the binary file, it's located at 080003b4:
000003b4 <.data+0x3b4>:
3b4: 2100 movs r1, #0
3b6: e003 b.n 0x3c0
My question is, why does it point 1 byte after? This code surprisingly works on actual board. Even without disassembling, shouldn't instructions be aligned by 2 byte? how can address be 0x000003b5?
Answer: ARM uses it for switching to THUMB mode.

'Bus Error' on ARMv6 when working with doubles

I'm creating a C++ program for ARMv6 which crashes with BUS ERROR. Using GDB I have traced the problem to the following code
double d = *(double*)pData; pData += sizeof(int64_t); // char *pData
The program goes through a received message and has to extract some double values using the above code. The received message has several fields, some doubles some not.
On x86 architectures this works fine, but on ARM I get the 'bus error'. So, I suspect my problem is alignment of data -- the double fields have to be aligned to word boundaries in memory on the ARM architecture.
I have tried the following as a fix, which did not work (still got the error):
int64_t i = *(int64_t*)pData;
double d = *((double*)&i);
The following worked (so far):
double d = 0;
memcpy(&d, pData, sizeof(double));
Is using 'memcpy' the best approach? Or, is there a better way?
In my case I do not have control over the packing of the data in the buffer or the order of the fields in the message.
Related question: std::atomic<double> on Armv7 (RPi2) and alignment/bus errors
Is using 'memcpy' the best approach?
In general it's the only correct approach, unless you're targeting a single ABI in which no type requires greater than 1-byte alignment.
The C++ standard is rather verbose, so I'll quote the C standard expressing the same thing much more succinctly:
A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.
There it is: that ever-present spectre of undefined behaviour. Even an x86 compiler is perfectly well allowed to break into your house and rub jam into your hair while you sleep instead of loading that data the way you expect, if its ABI says so.
One thing to note, though, is that modern compilers tend to be clever enough that correctness doesn't necessarily come at the cost of performance. Let's flesh out that example code:
#include <string.h>
double func(char *data) {
double d;
memcpy(&d, data, sizeof d);
return d;
}
...and throw it at a compiler:
$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S test.c
...
func: # #func
.fnstart
# BB#0:
push {r4, r5, r11, lr}
sub sp, sp, #8
mov r2, r0
ldrb r1, [r0, #3]
ldrb r3, [r0, #2]
ldrb r12, [r0]
ldrb lr, [r0, #1]
ldrb r4, [r2, #4]!
orr r5, r3, r1, lsl #8
ldrb r3, [r2, #2]
ldrb r2, [r2, #3]
ldrb r0, [r0, #5]
orr r1, r12, lr, lsl #8
orr r2, r3, r2, lsl #8
orr r0, r4, r0, lsl #8
orr r1, r1, r5, lsl #16
orr r0, r0, r2, lsl #16
str r1, [sp]
str r0, [sp, #4]
vpop {d0}
pop {r4, r5, r11, pc}
OK, so it's playing things safe with a bytewise memcpy; at least it's inlined. But hey, ARMv6 does at least support unaligned word and halfword accesses if the CPU is configured appropriately - let's tell the compiler we're cool with that:
$ clang -target arm -march=armv6 -mfpu=vfpv3 -mfloat-abi=hard -O1 -S -munaligned-access test.c
...
func: # #func
.fnstart
# BB#0:
sub sp, sp, #8
ldr r1, [r0]
ldr r0, [r0, #4]
str r0, [sp, #4]
str r1, [sp]
vpop {d0}
bx lr
There we go, that's about the best you can do with just integer word loads. Now, what if we compile it for something a bit newer?
$ clang -target arm -march=armv7 -mfpu=neon-vfpv4 -mfloat-abi=hard -O1 -S test.c
...
func: # #func
.fnstart
# BB#0:
vld1.8 {d0}, [r0]
bx lr
I can guarantee that, even on a machine where it would "work", no undefined-behaviour-hackery would correctly load that unaligned double in fewer than one instructions. Note that NEON is the key player here - vld1 only requires the base address to be aligned to the element size, so for 8-bit elements it can never be unaligned. In the more general case (say, if it were a long long instead of a double) you might still need -munaligned-access to convince the compiler as before.
For comparison, let's just see how everyone's favourite mutant-grandchild-of-a-1970s-calculator-chip fares as well:
clang -O1 -S test.c
...
func: # #func
# BB#0:
movl 4(%esp), %eax
fldl (%eax)
retl
Yup, the correct code still also looks like the best code.

Simple Assembly Language doubts

I had worked out some code for my assignment and something tells me that I'm not doing it correctly.. Hope someone can take a look at it.
Thank you!
AREA Reset, CODE, READONLY
ENTRY
LDR r1, = 0x13579BA0
MOV r3, #0
MOV r4, #0
MOV r2, #8
Loop CMP r2, #0
BGE DONE
LDR r5, [r1, r4]
AND r5, r5, #0x00000000
ADD r3, r3, r5
ADD r4, r4, #4
SUB r2, r2, #1
B Loop
LDR r0, [r3]
DONE B DONE
END
Write an ARM assembly program that will add the hexadecimal digits in register 1 and save the sum in register 0. For example, if r1 is initialized as follows:
LDR r1, =0x120A760C
When you program has run to completion, register 0 will contain the sum of 1+2+0+A+7+6+0+C.
You will need to use the following in your solution:
· An 8-iteration loop
· Logical shift right instruction
· The AND instruction (used to force selected bits to 0)
I know that I did not even use LSR. where should I put it? I'm just getting started on Assembly hope someone makes some improvements on this code..