How to know & log which instruction caused SIGABRT in my macOS daemon? - c++

I've written a macOS launch daemon in C++ (that runs on Apple M2 chip, which I believe uses the ARM64 architecture.) I noticed in the log that once in a while my daemon causes the SIGABRT with calling pid of my own process:
SIGNAL: SIGABRT, sig_info=0x16bbea7c0 >
{si_signo=6, si_errno=0, si_code=0, si_pid=3320:"MyDaemon", si_uid=0, si_status=6,
si_addr=0x18b0ab224, si_value=0x0, si_band=0}
The handler for the signal allows me to retrieve the context of the signal:
void signalCallback(int sig, siginfo_t *info, void *context)
{
}
The question is how do I retrieve the faulty operation that caused SIGABRT?
Or better yet, a callstack?

Related

Output thread IDs as seen by debugger

I'm developing a multi-threaded C++ application using GCC 4.4.5 and GDB 7.2.
At the moment, I have four threads. Each one interacts with a CAN bus in one form or another, either reading, writing, polling or handling messages.
In order to determine which thread is doing what, I have decided to add the thread IDs to log messages.
In my logging functions, I have the following code:
// This is for outputting debug messages
void logDebug(string msg, thread::id threadId[ = NULL]) {
#ifdebug _DEBUG
threadState.outputLock->lock();
if (threadId != NULL)
cout << "[Thread #" << threadId << "] ";
// The rest of the output
threadState.outputLock->unlock();
#endif
}
This is the (debug) output from the application:
[Thread #3085296768] [DEBUG] [Mon Jun 17 10:18:45 2019] CAN frame was empty or no message on bus...
----------
And this is the what GDB is telling me:
Thread #3 7575 [core: 0] (Suspended: Breakpoint)
----
Why is the debugger giving me different information from the application (the thread IDs/numbers) and is there a way to output the same information in the application, as the debugger is telling me?
The expected behaviour is that the thread IDs are identical.
EDIT:
I forgot to add some possibly important information.
I'm cross-compiling to an embedded device powered by a POWERPC chip, running a derivative of Debian Wheezy.
You can get the thread id from your application with the following system call : syscall(SYS_gettid)
From there you can set the thread name by either :
writing directly the name in /proc/PID/task/TID/comm
using the pthread function int pthread_setname_np(pthread_t thread, const char *name)
Then in GDB you can easily match the given thread name, its Linux TID and the GDB thread ID with info threads command.
Hope this helps.

Why does the program hang in WSL?

I have the following code in my program.
Thread* t = arg->thread;
//at this point, the new thread is being executed.
t->myId = TGetId();
void* (*functor)(void*) = t->functor;
void* fArg = arg->arg;
nfree(arg);
_INFO_PRINTF(1, "Launching thread with ID: %d", t->myId);
sigset_t mask;
sigfillset(&mask); //fill mask with all signals
sigdelset(&mask, SIGUSR1); // allow SIGUSR1 to get to the thread.
sigdelset(&mask, SIGUSR2); // allow SIGUSR2 to get to the thread.
pthread_sigmask(SIG_SETMASK, &mask, NULL); //block some sigs
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_handler = TSignalHandler;
act.sa_mask = mask;
if(sigaction(SIGUSR1, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
if(sigaction(SIGUSR2, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
void* ret = functor(fArg);
t->hasReturned = true;
return ret;
The thread that executes this code will properly call the signal handler when on native linux. The problem is that on Windows Subsystem for Linux, the program hangs with the SIGUSR1 or SIGUSR2 is sent via pthread_kill which sends signals to a thread. Why does this work on native ubuntu (via VMWARE WORKSTATION 14) and debian and fedora, but NOT WSL?
When you have a hanging bug that you cannot reproduce when running within the debugger, you can attach the debugger to the running process after you reproduce the hang. This won't let you observe the variables changing as you lead to the hang, but at least you get the stack trace of exactly where the hang is occurring.
Once you know the process id of the hung process (assume it's 12345), you can use:
$ gdb -p 12345
Or, you can kill the process with a signal that will cause a core to be generated. I like to use SIGTRAP, since it is easy to distinguish from a SIGSEGV.
$ kill -SIGTRAP 12345
And then you can use gdb to discover what the process was hanging on.
The advantage of attaching to the running process is that the process is still live. This allows you to call functions from the debugger, which may provide easier access to diagnostics built into your program. The core file preserves the error, which is beneficial if the hanging bug is difficult to reproduce.

C++ & OpenSSL: SIGPIPE when writing in closed pipe

I'm coding a C++ SSL Server for TCP Connections on Linux.
When the program uses SSL_write() to write into a closed pipe, a SIGPIPE-Exception gets thrown which causes the program to shut down. I know that this is normal behaviour. But the program should not always die when the peer not closes the connection correctly.
I have already googled a lot and tried pretty much everything I found, but it seems like nothing is working for me. signal(SIGPIPE,SIG_IGN) does not work - the exception still gets thrown (Same for signal(SIGPIPE, SomeKindOfHandler).
The gdb output:
Program received signal SIGPIPE, Broken pipe.
0x00007ffff6b23ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) where
#0 0x00007ffff6b23ccd in write () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff7883835 in ?? () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#2 0x00007ffff7881687 in BIO_write () from /lib/x86_64-linux-gnu/libcrypto.so.1.0.0
#3 0x00007ffff7b9d3e0 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#4 0x00007ffff7b9db04 in ?? () from /lib/x86_64-linux-gnu/libssl.so.1.0.0
#5 0x000000000042266a in NetInterface::SendToSubscribers(bool) () at ../Bether/NetInterface.h:181
#6 0x0000000000425834 in main () at ../Bether/main.cpp:111
About the Code:
I'm using a thread which is waiting for new connections and accepting them. The thread then puts the connection information (BIO & SSL) into a static map inside the NetInterface class.
Every 5 seconds NetInterface::sendTOSubscribers() is executed from main(). This function accesses the static map and sends data to every connection in there. This function is also where the SIGPIPE comes from.
I have used signal(SIGPIPE,SIG_IGN) in main() (obviously before the 5-seconds loop) and in NetInterface::SendToSubscribers(), but it is not working anywhere.
Thanks for your help!
You have to call function sigaction to change this behavior either to ignore SIGPIPE or handle it in a specific way with your own signal handler. Please don't use function signal, it's obsolete.
http://man7.org/linux/man-pages/man2/sigaction.2.html
One way to do it (I haven't compiled this code but should be something like this):
void sigpipe_handler(int signal)
{
...
}
int main()
{
struct sigaction sh;
struct sigaction osh;
sh.sa_handler = &sigpipe_handler; //Can set to SIG_IGN
// Restart interrupted system calls
sh.sa_flags = SA_RESTART;
// Block every signal during the handler
sigemptyset(&sh.sa_mask);
if (sigaction(SIGPIPE, &sh, &osh) < 0)
{
return -1;
}
...
}
If the program is multithreaded, it is a little different as you have less control on which thread will receive the signal. That depends on the type of signal. For SIGPIPE, it will be sent to the pthread that generated the signal. Nevertheless, sigaction should work OK.
It is possible to set the mask in the main thread and all subsequently created pthreads will inherit the signal mask. Otherwise, the signal mask can be set in each thread.
sigset_t blockedSignal;
sigemptyset(&blockedSignal);
sigaddset(&blockedSignal, SIGPIPE);
pthread_sigmask(SIG_BLOCK, &blockedSignal, NULL);
However, if you block the signal, it will be pending for the process and as soon as it is possible it will be delivered. For this case, use sigtimedwait at the end of the thread. sigaction set at the main thread or in the thread that generated SIGPIPE should work as well.
I've found the solution, it works with pthread_sigmask.
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGPIPE);
if (pthread_sigmask(SIG_BLOCK, &set, NULL) != 0)
return -1;
Thanks to everyone for the help!

Deadlock when spawning threads?

I have an application that has one main thread which spawns another thread which spawns threads for each request received and I'm getting a core dump probably due to a deadlock. On gdb I see the following:
__lll_lock_wait_private ();
_L_lock_4714 ();
start_thread ();
clone ();
This is generated from the following code sample:
do
{
pthread_t handle;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_create(&handle, 0, run, msg);
pthread_detach(handle);
} while (!stop)
run is an extern function while the rest of the code is part of class methods.
void* run(void* arg)
{
Handler handler;
Msg* msg = static_cast<Msg*> (arg);
handler.handleMsg(msg);
return NULL;
}
handleMsg method does some processing and then call another application thru a system call:
...
system("AnotherApplication param1, param2 &");
...
Note the ampersand. It is on purpose because I want the process to run asynchronously. The response goes thru the main thread thru another type of communication.
This application has been running on Linux:
Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Mon Jun 17 15:54:20 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux
I'm not ignoring any signals.
What could be the problem here?
pthread_detach manual informs us:
Attempting to detach an already detached thread results in unspecified behavior.
However, you're creating your threads as detached from the start up.
What results do you expect?

C++ program does not process any function call or printf during SIGSEGV in gcc

I am having problem in getting my stack trace output to stderr or dumping to a log file. I am running the code in Kubuntu10.04 with gcc compiler (4.4.3). The issue is that in the normal running mode (without gdb), the program does not output anything except 'Segmentation Fault'. I wish to output the backtrace output as in the print statements below. When I run gdb with my application, it comes to the printf/fprintf/(function call) statement, and then crashes with the following statement:
669 {
(gdb)
670 printf("Testing for stability.\n");
(gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff68b1f45 in puts () from /lib/libc.so.6
The strange things is that it works if I call a function within the same file that crashes, it works fine and spews the output properly. But if the program crashes in a function outside this file, it does not print any output.
So no printf or file dumping statement or function call gets processed. I am using the following sample code:
void bt_sighandler(int sig, siginfo_t *info,
void *secret) {
void *trace[16];
char **messages = (char **)NULL;
int i, trace_size = 0;
ucontext_t *uc = (ucontext_t *)secret;
/* Do something useful with siginfo_t */
if (sig == SIGSEGV)
printf("Got signal %d, faulty address is %p, "
"from %p\n", sig, info->si_addr,
uc->uc_mcontext.gregs[0]);
else
printf("Got signal %d#92; \n", sig);
trace_size = backtrace(trace, 16);
/* overwrite sigaction with caller's address */
trace[1] = (void *) uc->uc_mcontext.gregs[0];
messages = backtrace_symbols(trace, trace_size);
/* skip first stack frame (points here) */
printf("[bt] Execution path:#92; \n");
for (i=1; i<trace_size; ++i)
printf("[bt] %s#92; \n", messages[i]);
exit(0);
}
int main() {
/* Install our signal handler */
struct sigaction sa;
sa.sa_sigaction = (void *)bt_sighandler;
sigemptyset (&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGSEGV, &sa, NULL);
sigaction(SIGUSR1, &sa, NULL);
/* Do something */
printf("%d#92; \n", func_b());
}
Thanks in advance for any help.
Unfortunately you just can't reliably do much of anything in a SIGSEGV handler. Think about it this way: Your program has a serious error and its state (including system level state such as the heap) is in an inconsistent state.
In such a case, you can't expect the OS to magically fix up the heap and other internals it needs in order to be able to execute arbitrary code within your signal handler.
If the SEGV happens in your own code, the good solution is to use the core and fix the root problem. If the core happens in other code via say a shared library, I'd suggest isolating that code in an entirely separate binary and communicate between the two binaries. Then if the library crashes your main program does not.
You are supposed to do very little in a signal handler, in principle only access variables of type sig_atomic_t and volatile data.
Doing I/O is definitely out of the question. See this page for gcc:
http://www.gnu.org/s/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy
Try using simpler functions, such as strcat() and write().
Is there a reason you can't use valgrind?
When the application crashes Linux creates a core dump with the state of the application when it crashed. The core file can be examined using gdb.
If no core file is created try changing core file size with
ulimit -c unlimited
in the same shell and before the program is started.
The name of the core file is usually core.PID where PID is the pid of the program.
The core file is usually placed somewhere in /tmp or the directory where the program was started.
A lot more info on core files is available on the man page for core. Use
man core
to read the man page.
I managed to get it partially working. Actually I was running the application in 'sudo' mode. Running it in user mode gives me the callstack. However running in user mode disables hardware acceleration (nvidia graphics drivers). To resolve that, I added myself to the 'video' group, so that I have access to /dev/nvidia0 & /dev/nvidiactl. However when I get the access the stack does not get generated anymore. Its only when I am in user mode and hardware acceleration is disabled, the stack is coming. But I can't run my application without hardware acceleration (mean some important functionality would get disabled). Please let me know if anyone has any idea.
Thanks.