Are exceptions stacked by the Cortex-M hardware in thread-mode or handler mode?

Are exceptions stacked by the Cortex-M hardware in thread-mode or handler mode? - cortex-m

On Cortex-M processors with MPUs (let's use Cortex-M4 to be specific, but I bet the answer is the same for e.g. M3), what privilege mode is does the hardware exception entry stacking run in w.r.t the MPU?
Suppose I'm running in unprivileged thread mode using the process stack (PSP), with the MPU set to only accept writes within a particular region (e.g. a user-mode process is running). When an exception occurs, before the handler executes (in handler mode), the hardware stacks registers r0-r3,lr,pc etc onto the PSP. Does this also occur in unprivileged thread mode?
Specifically, suppose the process sets it's SP to some arbitrary point in memory it should not be allowed to write to, will the exception stacking result in a memory fault?

Coming back to this a year later after having dealt with this, the answer is that stacking occurs with whatever privilege was previously running.
So, if in unprivileged mode an interrupt occurs, the hardware will stack registers on the PSP using the existing MPU settings as though unprivileged code is performing the stacking. If stacking would violate MPU rules, a MemManage Fault occurs, and the MemManage Fault Status Register's MSTKERR field will be set (page 4-25 of the Cortex-M4 user guide)

About MPU rule violation & MSTKERR / MUNSKERR, when exception occurs in unprivileged software, and MPU is enabled:
On the exception entry, if the base address of allocated stack memory for the unprivileged software is NOT aligned to its stack size, then Cortex-M4 generates MemManage fault and MSTKERR field is set.
On the exception return, similarly if the base address of allocated stack memory is NOT aligned to its stack size, then Cortex-M4 generates MemManage fault and MUNSKERR field is set.
For example MPU_RASR.SIZE = 0x7 means the MPU region for the stack has size 2^(7+1) = 256 bytes , then MPU_RBAR.ADDR must be like 0x00000100 , 0x00000200 ... etc., otherwise Cortex-M4 generates corresponding MemManage fault immediately on exception entry/return.
For more information please read section 4.5.4 MPU Region Base Address Register in DUI0553 - Cortex ™ -M4 Devices Generic User Guide .

Related

How does gdb read the register values of a program / process it's debugging? How are registers associated with a process?

I wrote a short program in c++ :
#include<iostream>
using namespace std;
int main(){
int x=10;
int y=20;
cout<< x+y <<endl;
return 0;
}
just out of curiosity i wanted to understand a program behind the hood so i was playing with gdb & came acrooss info registers command .when i use info registers in gdb i get output like this:
(gdb) info registers
rax 0x400756 4196182
rbx 0x0 0
rcx 0x6 6
rdx 0x7fffffffd418 140737488344088
rsi 0x7fffffffd408 140737488344072
rdi 0x1 1
rbp 0x7fffffffd320 0x7fffffffd320
rsp 0x7fffffffd320 0x7fffffffd320
r8 0x7ffff7ac1e80 140737348640384
r9 0x7ffff7dcfea0 140737351843488
r10 0x7fffffffd080 140737488343168
r11 0x7ffff773a410 140737344939024
r12 0x400660 4195936
r13 0x7fffffffd400 140737488344064
r14 0x0 0
r15 0x0 0
rip 0x40075a 0x40075a <main+4>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
I understand these are registers and their values but what I want to know is how/why are registers associated with a process. the values of registers should be changing continuously as different processes are scheduled by the operating system? I referred to the command info registers & this is what I found but this is still confusing.
info registers -> Prints the names and values of all registers except
floating-point and vector registers (in the selected stack frame).

Registers change all the time. In fact, even the debugger changes register values, as it has to run itself.
However, while you look at your program with a debugger, the debugger suspends your running process. As part of suspending, the CPU state is saved to RAM. The debugger understands this, and can just look at the suspended state in RAM. Say that register R1 was saved to address 0x1234 on suspending, then the debugger can just print the bytes stored at that address.

Each thread/process has its own register values. The user-space "architectural state" (register values) is saved on entering the kernel via a system call or interrupt. (This is true on all OSes).
See What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for a look at Linux's system-call entry points, with the hand-written asm that actually saves registers on the process's kernel stack. (Each thread has its own kernel stack, in Linux).
In multi-tasking OSes in general, every process/thread has its own memory space for saving state, so context switches work by restoring the saved state from the thread being switched to. This is a bit of a simplification, because there's kernel state vs. saved user-space. state1
So any time a process isn't actually running on a CPU core, its register values are saved in memory.
The OS provides an API for reading/writing the saved register state, and memory, of other processes.
In Linux, this API is the ptrace(2) system call; it's what GDB uses to read register values and to single-step. Thus, GDB reads saved register values of the target process from memory, indirectly via the kernel. GDB's own code doesn't use any special x86 instructions, or even load / store from any special addresses; it just makes system calls because access to another process's state has to go through the kernel. (Well I think a process could map another process's memory into its own address space, if Linux even has a system call for that, but I think memory reads/writes actually go through ptrace just like register accesses.)
(I think) If the target process was currently executing (instead of suspended) when another process made a ptrace system call that read or wrote one of its register values, the kernel would have to interrupt it so its current state would be saved to memory. This doesn't normally happen with GDB: it only tries to read register values when it's suspended the target process.
ptrace is also what strace uses to trace system calls. See Playing with ptrace, Part I from Linux Journal. strace ./my_program is fantastically useful for systems programming, especially when making system calls from hand-written asm, to decode the args you're actually passing, and the return values.
Footnotes:
In Linux, the actual switch to a new thread happens inside the kernel, from kernel context to kernel context. This saves "only" the integer registers on the kernel stack, sets rsp to the right place in the other thread's kernel stack, then restores the saved registers. So there's a function call that, when it returns, is executing in kernel mode for the new thread, with per-CPU kernel variables set appropriately. User-space state for the new thread is eventually restored the same way it would have been if the system call or interrupt that originally entered the kernel from user-space had returned without calling the scheduler. i.e. from the state saved by the system call or interrupt kernel entry point. Lazy / eager FPU state saving is another complication; the kernel generally avoids touching the FPU so it can avoid saving/restoring FPU state when just entering the kernel and returning back to the same user-space process.

Custom debugger based on windows api - debugging multithreaded process

In my debugger i set particular memory address to 0xCC (int3) and while execution one of the threads reaches this address. Exception is thrown. While handling exception i subtract IP register to point one instruction before 0xCC and replace 0xCC with original byte. I also set flag in thread context to throw exception after execute one instruction - I need to set back 0xCC byte.
Problem:
Code executes correctly but I realized that there possibly is a bug. After receiving exception I set back original byte and set flag in thread to back to the debugger just right after it executes one instruction (it lets me to set back int3). It sounds good but I have detected that after original byte is executing another thread also executes this instruction without throwing exception (I think it can be related to threads switching).

How to handle this exception? (zero esp)

How to handle this exception?
__asm
{
mov esp, 0
mov eax, 0
div eax
}
This is not handled with try/except or SetUnhandledExceptionFilter().

Assuming this is running in an operating system, the operating system will catch the divide by zero, and then ATTEMPT to form an exception/signal stackframe for the application code. However, since the user-mode stack is "bad", it can't.
There is really no way for the operating system to deal with this, other than kill the application. [Theoretically, the could make up a new stack from some dynamically allocated memory, but it's pretty pointless, as there is no (always working) way for the application itself to recover to a sane state].
Don't set the stack pointer to something that isn't the stack - or if you do store "random" data in the stack pointer register, do not have exceptions. It's the same as "don't aim a gun at your foot and pull the trigger, unless you want to be without your foot".
Edit:
If the code is running in "kernel mode" rather than "usermode", it's even more "game over", since it will "double-fault" - the processor hits a divide by zero exception handler, which tries to write to the stack, and when it does so, it faults. This is now a "fault within a fault handler", aka a "double-fault". The typical setup of the double-fault handler is to have a separate stack, which then recovers the fault handler. But it's still game over - we don't know how to return to the original fault handler [or how to find out what the original fault handler was].
If there is no "new stack" with the double fault handler, it will triple fault a x86 processor - typically, a triple fault will make the processor restart [technically, it halts the processor with a special combination of bits signalled on the address bus to indicate that it's a "triple fault". The typical PC northbridge then resets the processor in recognition that the triple fault is an unrecoverable situation - this is why sometimes your PC simply reboots when you have poor quality drivers].

It's not a good idea to try to interact with a higher-level language's exception mechanism from embedded assembly. The compiler can do "magic" that you cannot match, and there's no (portable) way to tell the compiler that "this assembly code might throw an exception".

Threads are blocked in malloc and free, virtual size

I'm running a 64-bit multi-threaded program on the windows server 2003 server (X64), It run into a case that some of the threads seem to be blocked in the malloc or free function forever. The stack trace is like follows:
ntdll.dll!NtWaitForSingleObject() + 0xa bytes
ntdll.dll!RtlpWaitOnCriticalSection() - 0x1aa bytes
ntdll.dll!RtlEnterCriticalSection() + 0xb040 bytes
ntdll.dll!RtlpDebugPageHeapAllocate() + 0x2f6 bytes
ntdll.dll!RtlDebugAllocateHeap() + 0x40 bytes
ntdll.dll!RtlAllocateHeapSlowly() + 0x5e898 bytes
ntdll.dll!RtlAllocateHeap() - 0x1711a bytes
MyProg.exe!malloc(unsigned __int64 size=0) Line 168 C
MyProg.exe!operator new(unsigned __int64 size=1) Line 59 + 0x5 bytes C++
ntdll.dll!NtWaitForSingleObject()
ntdll.dll!RtlpWaitOnCriticalSection()
ntdll.dll!RtlEnterCriticalSection()
ntdll.dll!RtlpDebugPageHeapFree()
ntdll.dll!RtlDebugFreeHeap()
ntdll.dll!RtlFreeHeapSlowly()
ntdll.dll!RtlFreeHeap()
MyProg.exe!free(void * pBlock=0x000000007e8e4fe0) C
BTW, the param values passed to the new operator is not correct here maybe due to optimization.
Also, at the same time, I found in the process Explorer, the virtual size of this program is 10GB, but the private bytes and working set is very small (<2GB). We did have some threads using virtualalloc but in a way that commit the memory in the call, and these threads are not blocked.
m_pBuf = VirtualAlloc(NULL, m_size, MEM_COMMIT, PAGE_READWRITE);
......
VirtualFree(m_pBuf, 0, MEM_RELEASE);
This looks strange to me, seems a lot of virtual space is reserved but not committed, and malloc/free is blocked by lock. I'm guessing if there's any corruptions in the memory/object, so plan to turn on gflag with pageheap to troubleshoot this.
Does anyone has similar experience on this before? Could you share with me so I may get more hints?
Thanks a lot!

Your program is using PageHeap, which is intended for debugging only and imposes a ton of memory overhead. To see which programs have PageHeap activated, do this at a command line.
% Gflags.exe /p
To disable it for your process, type this (for MyProg.exe):
% Gflags.exe /p /disable MyProg.exe

Pageheap.exe detects most heap-related bugs - try Pageheap
Also you should look in to "the param values passed to the new ..." - does this corruption occur in the debug mode? make sure all optimizations are disabled.

If your system is running out of memory, it might be the case that the OS is swapping, that means that for a single allocation, in the worst case the OS could need to locate the best candidate for swapping, write it to disk, free the memory and return it. Are you sure that it is locking or might it just be performing very slowly? Can another thread be swapping memory to disk while these two threads wait for it's call to malloc/free to complete?

My preferred solution for debugging leaks in native applications in to use UMDH to get consecutive snapshots of the user-mode heap(s) in the process and then run UMDH again to diff the snapshots. Any pattern of change in the snapshots is likely a leak.
You get a count and size of memory blocks bucketed by their allocating callstack so it's reasonably straightforward to see where the biggest hogs are.
The user-mode dump heap (UMDH) utility
works with the operating system to
analyze Windows heap allocations for a
specific process.

trap invalid opcode rip rsp

We see a couple of below mentioned messages in /var/log/messages for one of our application:
Sep 18 03:24:23 <machine_name> kernel: application_name[14682] trap invalid opcode rip:f6c6e3ce rsp:ffc366bc error:0
...
Sep 18 03:19:35 <machine_name> kernel: application_name[4434] general protection rip:f6cd43a2 rsp:ffdfab0c error:7b2
I am not able to make what’s these output means and how we can track the function / code that is causing the issue. Further what is 'trap invalid opcode' and 'general protection' means?

Usually that means that your program's instruction pointer points to data or garbage. That's commonly caused by writing to stray pointers and such.
One scenario would be that your code writes (through a stray pointer) over some class' virtual table, replacing the member function addresses with nonsense. The next time you call one of the class' virtual functions, your program will interpret the garbage as an address and jump to that address. If whatever data lies at this address happens to not to be a valid machine code instruction for your processor, you would see this error.

There is another possibility that can cause 'invalid' op codes, that would be hardware not supporting newer opcode/instruction sets(SSE 4/5) or it not being from the right manufacturer(both AMD and Intel have some specific opcodes that work only on their processors) or just not having permission to exectute certain ops(though this would probably show up as something else).
From the above I would take RIP to be 'register(?) instruction pointer' and RSP to be 'register stack pointer', in which case you could use a debugger and set an execution hardware breakpoint on the specified address(RIP) and trace back what is calling it.(it seems your using linux or unix, so this is quite vague). if you are on windows, try using a custom exception filter to capture the EXCEPTION_ILLEGAL_INSTRUCTION event to get a little more information

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js