Why does the program hang in WSL? - c++

I have the following code in my program.
Thread* t = arg->thread;
//at this point, the new thread is being executed.
t->myId = TGetId();
void* (*functor)(void*) = t->functor;
void* fArg = arg->arg;
nfree(arg);
_INFO_PRINTF(1, "Launching thread with ID: %d", t->myId);
sigset_t mask;
sigfillset(&mask); //fill mask with all signals
sigdelset(&mask, SIGUSR1); // allow SIGUSR1 to get to the thread.
sigdelset(&mask, SIGUSR2); // allow SIGUSR2 to get to the thread.
pthread_sigmask(SIG_SETMASK, &mask, NULL); //block some sigs
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_handler = TSignalHandler;
act.sa_mask = mask;
if(sigaction(SIGUSR1, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
if(sigaction(SIGUSR2, &act, NULL))
{
_ERROR_PRINT(1, "Could not set signal action.");
return NULL;
}
void* ret = functor(fArg);
t->hasReturned = true;
return ret;
The thread that executes this code will properly call the signal handler when on native linux. The problem is that on Windows Subsystem for Linux, the program hangs with the SIGUSR1 or SIGUSR2 is sent via pthread_kill which sends signals to a thread. Why does this work on native ubuntu (via VMWARE WORKSTATION 14) and debian and fedora, but NOT WSL?

When you have a hanging bug that you cannot reproduce when running within the debugger, you can attach the debugger to the running process after you reproduce the hang. This won't let you observe the variables changing as you lead to the hang, but at least you get the stack trace of exactly where the hang is occurring.
Once you know the process id of the hung process (assume it's 12345), you can use:
$ gdb -p 12345
Or, you can kill the process with a signal that will cause a core to be generated. I like to use SIGTRAP, since it is easy to distinguish from a SIGSEGV.
$ kill -SIGTRAP 12345
And then you can use gdb to discover what the process was hanging on.
The advantage of attaching to the running process is that the process is still live. This allows you to call functions from the debugger, which may provide easier access to diagnostics built into your program. The core file preserves the error, which is beneficial if the hanging bug is difficult to reproduce.

Related

how to get process status ( running , killed ) event?

How to I get the status of another process?
i want to know the execution status of another process.
i want to receive and process the event as a inotify.
no search /proc by periods.
how to another process status (running , killed ) event?
SYSTEM : linux, solaris, aix
Linux
Under Linux (and probably many Unixes system) you can achieve this by using the ptrace call, then using waitpid to wait for status:
manpages:
ptrace call: http://man7.org/linux/man-pages/man2/ptrace.2.html
waitpid call: https://linux.die.net/man/2/waitpid
From the manpage:
Death under ptrace
When a (possibly multithreaded) process receives a killing signal
(one whose disposition is set to SIG_DFL and whose default action is
to kill the process), all threads exit. Tracees report their death
to their tracer(s). Notification of this event is delivered via
waitpid(2).
beware that you will need to have special authorization in certain cases. Take a look at /proc/sys/kernel/yama/ptrace_scope. (if you can modify the target program, you can also change the behavior of ptrace by calling ptrace(PTRACE_TRACEME, 0, nullptr, nullptr);
To use ptrace, first you must get your process PID, then call PTRACE_ATTACH:
// error checking removed for the sake of clarity
#include <sys/ptrace.h>
pid_t child_pid;
// ... Get your child_pid somehow ...
// 1. attach to your process:
long err;
err = ptrace(PTRACE_ATTACH, child_pid, nullptr, nullptr);
// 2. wait for your process to stop:
int process_status;
err = waitpid(child_pid, &process_status, 0);
// 3. restart the process (continue)
ptrace(PTRACE_CONT, child_pid, nullptr, nullptr);
// 4. wait for any change in status:
err = waitpid(child_pid, &process_status, 0);
// while waiting, the process is running...
// by default waitpid will wait for process to terminate, but you can
// change this with WNOHANG in the options.
if (WIFEXITED(status)) {
// exitted
}
if (WIFSIGNALED(status)) {
// process got a signal
// WTERMSIG(status) will get you the signal that was sent.
}
AIX:
The solution will need some adaptation to work with AIX, have a look at the doc there:
ptrace documentation: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.basetrf1/ptrace.htm
waitpid documentation: https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix.basetrf1/ptrace.htm
Solaris
As mentionned here ptrace may not be available on your version of Solaris, you may have to resort to procfs there.

Cancelling thread that is stuck on epoll_wait

I'm doing some event handling with C++ and pthreads. I have a main thread that reads from event queue I defined, and a worker thread that fills the event queue. The queue is of course thread safe.
The worker thread have a list of file descriptors and create an epoll system call to get events on those file descriptors. It uses epoll_wait to wait for events on the fd's.
Now the problem. Assuming I want to terminate my application cleanly, how can I cancel the worker thread properly? epoll_wait is not one of the cancellation points of pthread(7) so it cannot react properly on pthread_cancel.
The worker thread main() looks like this
while(m_WorkerRunning) {
epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
//handle events and insert to queue
}
The m_WorkerRunning is set to true when the thread starts and it looks like I can interrupt the thread by settings m_WorkerRunning to false from the main thread. The problem is that epoll_wait theoretically can wait forever.
Other solution I though about is: instead of waiting forever (-1) I can wait for example X time slots, then handle properly no-events case and if m_WorkerRunning == false then exit the loop and terminate the worker thread cleanly. The main thread then sets m_WorkerRunning to false, and sleeps X. However I'm not sure about the performance of such epoll_wait and also not sure what would be the correct X? 500ms? 1s? 10s?
I'd like to hear some experienced advises!
More relevant information: the fd's I'm waiting events on, are devices in /dev/input so technically I'm doing some sort of input subsystem. The targeted OS is Linux (latest kernel) on ARM architecture.
Thanks!
alk's answer above is almost correct. The difference, however, is very dangerous.
If you are going to send a signal in order to wake up epoll_wait, never use epoll_wait. You must use epoll_pwait, or you might run into a race with your epoll never waking up.
Signals arrive asynchronously. If your SIGUSR1 arrives after you've checked your shutdown procedure, but before your loop returns to the epoll_wait, then the signal will not interrupt the wait (as there is none), but neither will the program exit.
This might be very likely or extremely unlikely, depending on how long the loop takes in relation to how much time is spent in the wait, but it is a bug one way or the other.
Another problem with alk's answer is that it does not check why the wait was interrupted. It might be any number of reasons, some unrelated to your exit.
For more information, see the man page for pselect. epoll_pwait works in a similar way.
Also, never send signals to threads using kill. Use pthread_kill instead. kill's behavior when sending signals is, at best, undefined. There is no guarantee that the correct thread will receive it, which might cause an unrelated system call to be interrupted, or nothing at all to happen.
You could send the thread a signal which would interupt the blocking call to epoll_wait(). If doing so modify your code like this:
while(m_WorkerRunning)
{
int result = epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
if (-1 == result)
{
if (EINTR == errno)
{
/* Handle shutdown request here. */
break;
}
else
{
/* Error handling goes here. */
}
}
/* Handle events and insert to queue. */
}
A way to add a signal handler:
#include <signal.h>
/* A generic signal handler doing nothing */
void signal_handler(int sig)
{
sig = sig; /* Cheat compiler to not give a warning about an unused variable. */
}
/* Wrapper to set a signal handler */
int signal_handler_set(int sig, void (*sa_handler)(int))
{
struct sigaction sa = {0};
sa.sa_handler = sa_handler;
return sigaction(sig, &sa, NULL);
}
To set this handler for the signal SIGUSR1 do:
if (-1 == signal_handler_set(SIGUSR1, signal_handler))
{
perror("signal_handler_set() failed");
}
To send a signal SIGUSR1 from another process:
if (-1 == kill(<target process' pid>, SIGUSR1))
{
perror("kill() failed");
}
To have a process send a signal to itself:
if (-1 == raise(SIGUSR1))
{
perror("raise() failed");
}

c++ fork, without wait, defuncts execl

Looking to fork a process, in c++, that wont hang its parent process - its parent is a daemon and must remain running. If i wait() on the forked process the forked execl wont defunt - but - it will also hang the app - not waiting fixes the app hang - but the command becomes defunt.
if((pid = fork()) < 0)
perror("Error with Fork()");
else if(pid > 0) {
//wait here will hang the execl in the parent
//dont wait will defunt the execl command
//---- wait(&pid);
return "";
} else {
struct rlimit rl;
int i;
if (rl.rlim_max == RLIM_INFINITY)
rl.rlim_max = 1024;
for (i = 0; (unsigned) i < rl.rlim_max; i++)
close(i);
if(execl("/bin/bash", "/bin/bash", "-c", "whoami", (char*) 0) < 0) perror("execl()");
exit(0);
}
How can I fork the execl without a wait(&pid) where execl's command wont defunct?
UPDATE
Fixed by adding the following before the fork
signal(SIGCHLD, SIG_IGN);
Still working with my limited skills at a more compatible solution based on the accepted answer. Thanks!
By default, wait and friends wait until a process has exited, then reap it. You can call waitpid with the WNOHANG to return immediately if no child has exited.
The defunct/"zombie" process will sit around until you wait on it. So if you run it in the background, you must arrange to reap it eventually by any of several ways:
try waitpid with WNOHANG routinely: int pid = waitpid(-1, &status, WNOHANG)
install a signal handler for SIGCHLD to be notified when it exits
Additionally, under POSIX.1-2001, you can use sigaction set the SA_NOCLDWAIT on SIGCHLD. Or set its action to SIG_IGN. Older systems (including Linux 2.4.x, but not 2.6.x or 3.x) don't support this.
Check your system manpages, or alternative the wait in the Single Unix Specification. The Single Unix Spec also gives some helpful code examples. SA_NOCLDWAIT is documented in sigaction.
I think a signal handler would be the best way as indicated. I would like to point out another way this could be handled: Fork twice and have the child exit while the grandchild would call execl. The defunct process would then be cleaned up by the init process.
As said in comment, double fork saves process from defunct state.
What is the reason for performing a double fork when creating a daemon?

Child process receives parent's SIGINT

I have one simple program that's using Qt Framework.
It uses QProcess to execute RAR and compress some files. In my program I am catching SIGINT and doing something in my code when it occurs:
signal(SIGINT, &unix_handler);
When SIGINT occurs, I check if RAR process is done, and if it isn't I will wait for it ... The problem is that (I think) RAR process also gets SIGINT that was meant for my program and it quits before it has compressed all files.
Is there a way to run RAR process so that it doesn't receive SIGINT when my program receives it?
Thanks
If you are generating the SIGINT with Ctrl+C on a Unix system, then the signal is being sent to the entire process group.
You need to use setpgid or setsid to put the child process into a different process group so that it will not receive the signals generated by the controlling terminal.
[Edit:]
Be sure to read the RATIONALE section of the setpgid page carefully. It is a little tricky to plug all of the potential race conditions here.
To guarantee 100% that no SIGINT will be delivered to your child process, you need to do something like this:
#define CHECK(x) if(!(x)) { perror(#x " failed"); abort(); /* or whatever */ }
/* Block SIGINT. */
sigset_t mask, omask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
CHECK(sigprocmask(SIG_BLOCK, &mask, &omask) == 0);
/* Spawn child. */
pid_t child_pid = fork();
CHECK(child_pid >= 0);
if (child_pid == 0) {
/* Child */
CHECK(setpgid(0, 0) == 0);
execl(...);
abort();
}
/* Parent */
if (setpgid(child_pid, child_pid) < 0 && errno != EACCES)
abort(); /* or whatever */
/* Unblock SIGINT */
CHECK(sigprocmask(SIG_SETMASK, &omask, NULL) == 0);
Strictly speaking, every one of these steps is necessary. You have to block the signal in case the user hits Ctrl+C right after the call to fork. You have to call setpgid in the child in case the execl happens before the parent has time to do anything. You have to call setpgid in the parent in case the parent runs and someone hits Ctrl+C before the child has time to do anything.
The sequence above is clumsy, but it does handle 100% of the race conditions.
What are you doing in your handler? There are only certain Qt functions that you can call safely from a unix signal handler. This page in the documentation identifies what ones they are.
The main problem is that the handler will execute outside of the main Qt event thread. That page also proposes a method to deal with this. I prefer getting the handler to "post" a custom event to the application and handle it that way. I posted an answer describing how to implement custom events here.
Just make the subprocess ignore SIGINT:
child_pid = fork();
if (child_pid == 0) {
/* child process */
signal(SIGINT, SIG_IGN);
execl(...);
}
man sigaction:
During an execve(2), the dispositions of handled signals are reset to the default;
the dispositions of ignored signals are left unchanged.

C++ program does not process any function call or printf during SIGSEGV in gcc

I am having problem in getting my stack trace output to stderr or dumping to a log file. I am running the code in Kubuntu10.04 with gcc compiler (4.4.3). The issue is that in the normal running mode (without gdb), the program does not output anything except 'Segmentation Fault'. I wish to output the backtrace output as in the print statements below. When I run gdb with my application, it comes to the printf/fprintf/(function call) statement, and then crashes with the following statement:
669 {
(gdb)
670 printf("Testing for stability.\n");
(gdb)
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00007ffff68b1f45 in puts () from /lib/libc.so.6
The strange things is that it works if I call a function within the same file that crashes, it works fine and spews the output properly. But if the program crashes in a function outside this file, it does not print any output.
So no printf or file dumping statement or function call gets processed. I am using the following sample code:
void bt_sighandler(int sig, siginfo_t *info,
void *secret) {
void *trace[16];
char **messages = (char **)NULL;
int i, trace_size = 0;
ucontext_t *uc = (ucontext_t *)secret;
/* Do something useful with siginfo_t */
if (sig == SIGSEGV)
printf("Got signal %d, faulty address is %p, "
"from %p\n", sig, info->si_addr,
uc->uc_mcontext.gregs[0]);
else
printf("Got signal %d#92; \n", sig);
trace_size = backtrace(trace, 16);
/* overwrite sigaction with caller's address */
trace[1] = (void *) uc->uc_mcontext.gregs[0];
messages = backtrace_symbols(trace, trace_size);
/* skip first stack frame (points here) */
printf("[bt] Execution path:#92; \n");
for (i=1; i<trace_size; ++i)
printf("[bt] %s#92; \n", messages[i]);
exit(0);
}
int main() {
/* Install our signal handler */
struct sigaction sa;
sa.sa_sigaction = (void *)bt_sighandler;
sigemptyset (&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGSEGV, &sa, NULL);
sigaction(SIGUSR1, &sa, NULL);
/* Do something */
printf("%d#92; \n", func_b());
}
Thanks in advance for any help.
Unfortunately you just can't reliably do much of anything in a SIGSEGV handler. Think about it this way: Your program has a serious error and its state (including system level state such as the heap) is in an inconsistent state.
In such a case, you can't expect the OS to magically fix up the heap and other internals it needs in order to be able to execute arbitrary code within your signal handler.
If the SEGV happens in your own code, the good solution is to use the core and fix the root problem. If the core happens in other code via say a shared library, I'd suggest isolating that code in an entirely separate binary and communicate between the two binaries. Then if the library crashes your main program does not.
You are supposed to do very little in a signal handler, in principle only access variables of type sig_atomic_t and volatile data.
Doing I/O is definitely out of the question. See this page for gcc:
http://www.gnu.org/s/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy
Try using simpler functions, such as strcat() and write().
Is there a reason you can't use valgrind?
When the application crashes Linux creates a core dump with the state of the application when it crashed. The core file can be examined using gdb.
If no core file is created try changing core file size with
ulimit -c unlimited
in the same shell and before the program is started.
The name of the core file is usually core.PID where PID is the pid of the program.
The core file is usually placed somewhere in /tmp or the directory where the program was started.
A lot more info on core files is available on the man page for core. Use
man core
to read the man page.
I managed to get it partially working. Actually I was running the application in 'sudo' mode. Running it in user mode gives me the callstack. However running in user mode disables hardware acceleration (nvidia graphics drivers). To resolve that, I added myself to the 'video' group, so that I have access to /dev/nvidia0 & /dev/nvidiactl. However when I get the access the stack does not get generated anymore. Its only when I am in user mode and hardware acceleration is disabled, the stack is coming. But I can't run my application without hardware acceleration (mean some important functionality would get disabled). Please let me know if anyone has any idea.
Thanks.