debug port problem while running Lauterbach CMM script - trace32

Currently Im developing Lauterbach CMM scripts to automate test cases for SPC58NG84
As part of Test case:
- Need to reset target system before and after test case.
- Need to read and wrte variable values from C code.
When I run test scripts I got error 'debug port problem' and in 'watch window' all variable values showing BUS ERROR.
Can you please let me know how to debug this issue?.
What are the reasons causing 'debug port problem'?
Error Message in Area winodw:
CO:2 error: CPU suddenly left debug mode (OSR=0x3C1)
CO:0 JTAGID=0x11110041
Warning: CO:1 Core currently in reset. Stopping core on activation.
CMM Script:
Test Pre condition: Reset target
Break.Delete
WAIT 100.ms
SYStem.Mode Down
SYStem.DETECT.CPU
SYStem.Mode Up
B:: Go
WAIT 500.ms
Test case Execution:
--Read and write Variables in software-----
Test Post condition: Reset target
Break
Break.Delete
WAIT 100.ms
SYStem.Mode Down
SYStem.Mode Up
B:: Go
WAIT 1000.ms

The error 'debug port problem' after the Break command usually means that the target application crashed so badly that core does not respond to the debugger's halt command anymore.
In order to debug the problem, make sure that your boot loader sets up the interrupt vector start address (IVPR) as early as possible, and also put branch-to-self instructions to all interrupt handler addresses, unless interrupt handler code already exists.
Once this is done, set program preakpoints to the interrupt handlers typically involved in crashes: machine check, data storage, instruction storage, program interrupt. Doing so should catch the core when the crash occurs, and the SRR0 (CSRR or MCSRR, depending on interrupt type) will show you at which address the problem occurred.

Related

Kernel module to intercept system calls causes issues in execution of userspace programs

I've been trying to write a kernel module (using SystemTap) that would intercept system calls, capture its information and add it to a system call buffer region that is kmalloc'd. I have implemented a mmap file operation so that a user space process can access this kmalloc'd region and read from it.
Note: For now, the module only intercepts the memfd_create system call. To test this out I have compiled a test application that calls memfd_create twice.
SystemTap script/kernel module code
In addition to the kernel module, I also wrote a user space application that would periodically read the system calls of this buffer, determine whether the system call is legit or malicious and then adds a response to a response buffer region (also included in the kmalloc'd region and can be accessed using mmap) indicating whether to let the system call proceed or to terminated the calling process.
User space application code
The kernel module also has a timer that kicks every few milliseconds to check the response buffer for responses added by the user space. Depending on the response the kernel module would either terminate the calling process or let it proceed.
The issue I am facing is that after intercepting a few system calls (I keep executing test application) and processing them properly, I start facing some issues executing normal commands in the userspace. For example: A simple command like ls:
[virsec#redhat7 stap-test]$ ls
Segmentation fault
[virsec#redhat7 stap-test]$ strace ls
execve("/usr/bin/ls", ["ls"], [/* 30 vars */]) = -1 EFAULT (Bad address)
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++
Segmentation fault
This happens with every terminal command that I run. The dmesg output shows nothing but the printk debug outputs of the kernel module. There is no kernel panic, the kernel module and the userspace application are still running and waiting for the next system call to intercept. What do you think could be the issue? Let me know if you need any more information. I was not able to post any code snippets because I think the code would make much more sense as a whole.

Rare EXCEPTION_ACCESS_VIOLATION when debugging any process started with CREATE_SUSPENDED

While writing an x86 WinAPI-based debugger, I've encountered a rare condition when the debuggee (which usually works well) suddenly terminates with EXCEPTION_ACCESS_VIOLATION after I attach to it with my native debugger. I can stably reproduce this on any applications it seems (tried on .NET Hello World-styled application and on notepad.exe on multiple Windows 10 machines).
Essentially I've written a simple WaitForDebugEvent loop:
CreateProcessW(L"C:\\Windows\\SYSWOW64\\notepad.exe", […], CREATE_SUSPENDED, […]);
DebugActiveProcess(processId);
DEBUG_EVENT debugEvent = {};
while (WaitForDebugEvent(&debugEvent, INFINITE)) {
switch (debugEvent.dwDebugEventCode) {
// log all the events
}
ContinueDebugEvent(debugEvent.dwProcessId, debugEvent.dwThreadId, DBG_EXCEPTION_NOT_HANDLED);
}
DebugActiveProcessStop(processId);
(here's the full listing: I won't paste it all here, because there's some additional non-essential boilerplate there; the MCVE is 136 lines long)
For the sake of an example, I'll just log all the debugger events and detect whether the debuggee is ready to "proceed normally" or it will terminate due to an exception.
Most of the time, my debugging session looks like that:
CREATE_PROCESS_DEBUG_EVENT (which reports creation of both the process and its initial thread)
LOAD_DLL_DEBUG_EVENT (I was never able to get the name for this DLL, but this is documented in MSDN)
CREATE_THREAD_DEBUG_EVENT (which, I suspect, is a thread injected by debugger)
LOAD_DLL_DEBUG_EVENT […] — after this, many DLLs get loaded into the target process and everything looks okay, the process works as intended
But sometimes (in about 1.5% of all runs), the event sequence changes:
CREATE_PROCESS_DEBUG_EVENT
LOAD_DLL_DEBUG_EVENT
CREATE_THREAD_DEBUG_EVENT
EXCEPTION_DEBUG_EVENT: EXCEPTION_ACCESS_VIOLATION (which I never was able to gather details for: it reports a DEP violation, and the address is empty)
After that, I cannot proceed with debugging, because my debuggee is in exception state and will terminate soon. I was never able to catch notepad.exe crash without my debugger attached (and I doubt it is that bad and will crash for no reason), so I suspect that my debugger causes these exceptions.
One bizarre detail is that I could "fix" the situation by calling Sleep(1) immediately after WaitForDebugEvent. So, this is possibly some sort of race condition, but race condition between what? Between the debugger thread and other threads in the debuggee? Is it a thing? How are we supposed to debug other applications, then? How could actual debuggers work if it is a thing?
I couldn't reproduce the issue with the same code compiled for x64 CPU (and debugging an x64 process).
What could actually cause this erroneous behavior? I've carefully read the documentation about the API functions I call, and checked some other debugger examples online, but still wasn't able to find what's wrong with my debugger: it looks like I follow all the right conventions.
I have tried to debug my debuggee with WinDBG while it is still paused in my debugger, but had no luck doing that. First of all, it's difficult to attach to the debuggee with another debugger (WinDBG only allows to use non-intrusive mode, which is less functional it seems?), and the call stacks for the process' threads aren't usually meaningful.
Steps to reproduce
Checkout this repository, compile with MSVC and then execute in cmd:
Debug\NetRuntimeWaiter.exe > log.txt
It is important to redirect output to the log file and not show it in the terminal: without that, timings for the log writer get changed, and the issue won't reproduce (due to a possible race condition I mentioned earlier?).
Usually the program will start and terminate 1000 notepads in about 10 seconds, and 10-15 of 1000 invocations will hold the error condition (i.e. EXCEPTION_ACCESS_VIOLATION).
the DebugActiveProcess (and undocumented DbgUiDebugActiveProcess which is internally called by DebugActiveProcess) have serious design problem: after calling NtDebugActiveProcess it create remote thread in the target process, via DbgUiIssueRemoteBreakin call - as result new thread in target process is created - DbgUiRemoteBreakin - this thread call DbgBreakPoint and then RtlExitUserThread
all this not documented and explained, only this note from DebugActiveProcess:
After all of this is done, the system resumes all threads in the
process. When the first thread in the process resumes, it executes a
breakpoint instruction that causes an EXCEPTION_DEBUG_EVENT
debugging event to be sent to the debugger.
of course this is wrong. why is DbgUiRemoteBreakin first (??) thread ? and which thread resume first undefined. why not exactly write - we create additional (but not first) thread in process ? and this thread execute breakpoint.
however, when process already running - create this additional thread not create problems. but in case we create process in suspended state, and then just call DebugActiveProcess - the DbgUiRemoteBreakin really became first executing thread in process and process initialization was done on this thread, instead of created first thread. on xp this always lead to fail process initialize at connect to csrss phase. (csrss wait connect to it only on first created thread in process). on later systems this is fixed and process can execute as usual. but can and not, because thread on which it was initialized is exit. it can cause subtle problems.
solution here - not use DebugActiveProcess but NtDebugActiveProcess in it place.
the debug object we can create or via DbgUiConnectToDbg() and then get it via DbgUiGetThreadDebugObject() (system store debug object in thread TEB) or direct by call NtCreateDebugObject
also if we create debuggee process from another process(B) we can do next:
duplicate debug object from debugger process to this B process
call DbgUiSetThreadDebugObject(hDdg) just before call
CreateProcessW with DEBUG_ONLY_THIS_PROCESS or DEBUG_PROCESS
system will be use DbgUiGetThreadDebugObject() for get debug object
from your thread and pass it to low level process create api
remove debug object from your thread via
DbgUiSetThreadDebugObject(0)
really no matter who is create process with debug object. matter who is handle events posted to this debug object.
all undocumented api definitions you can take from ntdbg.h and then link with ntdll.lib or ntdllp.lib

how to quit currently running Trace32 from command line

I am doing the automate regression with Trace32. Before the regression starts, if any Trace32 process is in use, I want to kill the process. The problem is, if I kill it with system OS, when regression starts, the GUI will pop up a dialog saying "TRACE32 device already in use. Reset device and connect?" I have to manually click yes to continue to regression. Is there any way to quit the currently running Trace32 properly from command line, such that the reset dialog will not show when Trace32 starts next time. Or any command I can add to the .cmm file in my regression to skip this question dialog. I have tried to put RESet initially in .cmm, which does not help.
First of all try to end all your automated test with TRACE32 command QUIT. This will close TRACE32. However something might go wrong in your tests and thus, the QUIT command might not get reached and TRACE32 is still running.
So secondly start TRACE32 with an open Remote-API port. Add to your TRACE32 config-file (by default this is c:\T32\config.t32) the following lines
RCL=NETASSIST
PORT=20000
Before and after the block there must be an empty line. You can also choose any other number for PORT, which specifies a UDP/IP port, which gets opened by TRACE32. (If more than one TRACE32 instance is active at the same time, use different a port number for each instance.)
If TRACE32 was started with open Remote-API port you can send a QUIT command to the still running application instead of terminating it via a kill-command. To send the QUIT command used command line tool t32rem.exe as follow:
t32rem localhost port=20000 QUIT
Finally we need a way to handle the (hopefully rarely happening) situation that TRACE32 somehow crashed and is not longer responsive. Then you have to kill it of course. For a proper reconnect use the following setting CONNECTIONMODE=AUTOCONNECT in the PBI= section of you TRACE32 config-file (by default this is c:\T32\config.t32). This setting does the "Reset device and connect" without asking you.
Putting all together you config-file should look somehow like that:
OS=
ID=myT32
SYS=C:\T32
PBI=
USB
CONNECTIONMODE=AUTOABORT
RCL=NETASSIST
PORT=20000

Phantom Input When Running Green Hills Debugger

I'm running on a Marvell Monahans PXA320 under Green Hills INTEGRITY 5.0.10. I'm using MULTI 4.2.3 for development. I'm using an RTSERV connection for debugging, I've been asked to take over a menu-driven program.
I've noticed that if I halt the program (to modify breakpoints) and then resume it, the task gets into an infinite loop displaying the menu in the debugger I/O tab. After each instance of the menu that gets printed, it says that I have made an illegal selection. So, some input is apparently being fed into the task as if I had typed it in (and this input obviously corresponds to an invalid menu selection). I do not see on the display what this phantom input is.
Is there anything I can do to prevent a halt / resume from screwing up the I/O?
Thanks,
Dave
My first guess is that getc() (or your equivalent) is returning -1. This can happen if your input buffers overflowed as a result of halting the application. I/O keeps flowing while the application is halted...
It is generally not a good idea to halt the program when debugging with INTEGRITY. You're generally better off to attach the debugger to a single thread (something idle or infrequently used), set an "any-task" breakpoint in that thread, then resume the thread. (Don't close the window! Doing so will delete the breakpoint.) You'll see a "DebugBrk" status on the thread that hits the breakpoint -- then you can double-click and attach to that specific thread.
Following that alternate procedure should (hopefully!) prevent the I/O error.

How to log the segmentation faults and run time errors which can crash the program, through a remote logging library?

What is the technique to log the segmentation faults and run time errors which crash the program, through a remote logging library?
The language is C++.
Here is the solution for printing backtrace, when you get a segfault, as an example what you can do when such an error happens.
That leaves you a problem of logging the error to the remote library. I would suggest keeping the signal handler, as simple, as possible and logging to the local file, because you cannot assume, that previously initialized logging library works correctly, when segmentation fault occured.
What is the technique to log the segmentation faults and run time errors which crash the program, through a remote logging library?
From my experience, trying to log (remotely or into file) debugging messages while program is crashing might not be very reliable, especially if APP takes system down along with it:
With TCP connection you might lose last several messages while system is crashing. (TCP maintains data packet order and uses error correction, AFAIK. So if app just quits, some data can be lost before being transmitted)
With UDP connection you might lose messages because of the nature of UDP and receive them out-of-order
If you're writing into file, OS might discard most recent changes (buffers not flushed, journaled filesystem reverting to earlier state of the file).
Flushing buffers after every write or sending messages via TCP/UDP might induce performance penalties for a program that produces thousands of messages per second.
So as far as I know, the good idea is to maintain in-memory plaintext log-file and write a core dump once program has crashed. This way you'll be able to find contents of log file within core dump. Also writing into in-memory log will be significantly faster than writing into file or sending messages over network. Alternatively, you could use some kind of "dual logging" - write every debug message immediately into in-memory log, and then send them asynchronously (in another thread) into log file or over the network.
Handling of exceptions:
Platform-specific. On windows platform you can use _set_se_handlers and use it to either generate backtrace or to translate platform exceptions into c++ exceptions.
On linux I think you should be able to create a handler for SIGSEGV signal.
While catching segfault sounds like a decent idea, instead of trying to handle it from within the program it makes sense to generate core dump and bail. On windows you can use MiniDumpWriteDump from within the program and on linux system can be configured to produce core dumps in shell (ulimit -c, I think?).
I'd like to give some solutions:
using core dump and start a daemon to monitor and collect core dumps and send to your host.
GDB (with GdbServer), you can debug remotely and see backtrace if crashed.
To catch the segfault signal and send a log accordingly, read this post:
Is there a point to trapping "segfault"?
If it turns out that you wont be able to send the log from a signal handler (maybe the crash occurred before the logger has been intitialized), then you may need to write the info to file and have an external entity send it remotely.
EDIT: Putting back some original info to be able to send the core file remotely too
To be able to send the core file remotely, you'll need an external entity (a different process than the one that crashed) that will "wait" for core files and send them remotely as they appear. (possibly using scp) Additionally, the crashing process could catch the segfault signal and notify the monitoring process that a crash has occurred and a core file will be available soon.