Spawned processes becoming defunct - c++

I use posix_spawnp to spawn child processes from my main process.
int iRet = posix_spawnp(&iPID, zPath, NULL, NULL, argv, environ);
if (iRet != 0)
{
return false;
}
Sometimes, after a child process is spawned without errors, it suddenly becomes defunct. How could this occur?
I use a signal handler to reap child processes:
void SigCatcher(int n)
{
while(waitpid( -1, NULL, WNOHANG ) > 0);
}
and I manually call it whenever I kill a child process.
kill(oProcID, SIGKILL);
signal (SIGCHLD, SigCatcher);
Could this cause spawned children to go defunct (without me calling kill)?

This:
kill(oProcID, SIGKILL);
signal (SIGCHLD, SigCatcher);
looks like a race condition. You need to install the signal handler before killing the child process, otherwise you risk missing the signal.

Have you called:
signal(SIGCHLD, SigCatcher);
anywhere else?
If you haven't, then you need to do so before any child processes are even spawned to ensure that those children are reaped when they terminate.
As Unwind points out, your current calls to kill and signal are the wrong way around.
Typical use would be:
signal(SIGCHLD, handler);
posix_spawnp(...);
...
// do other stuff
...
kill(pid, SIGKILL);

Related

Why my SIGHUP handler failed on child process after fork and exec?

I try to make a C++ server demo that can restart after passing a SIGHUP signal in the terminal: kill -SIGHUP xx_pid, but it is weird that the SIGHUP signal can only be caught by the parent process. I use fork and exec function to generate the child process, also parent and child process execute the same code. Why my SIGHUP handler failed on the child process? How can I make it work on the child process. (The logic in sighup handler is fork and exec). Here is the code:
void sighup_handler(int signo) {
pid_t pid = fork();
...
execve("./server", argv_cstr.get(), NULL)
}
int main() {
// signal(SIGHUP, sighup_handler);
struct sigaction action;
action.sa_handler = restart_handler;
sigemptyset(&action.sa_mask);
action.sa_flags |= SA_RESTART;
action.sa_flags |= SA_RESETHAND;
if (sigaction(SIGHUP, &action, NULL) < 0) {
LOG(ERROR) << "Fail to register SIGHUP, abort";
abort();
}
while (1) {
std::cout << "a";
usleep(1000 * 4000L);
}
return 0;
}
I run the code in terminal: ./server, its pid is 2899. After I type kill -SIGHUP 2899, it generate a child process 39875. But I go on kill -SIGHUP 39875, it doesn't work. Why? The child process also execute the whole codes, include the signal(SIGHUP, restart_handler); in the first line of main(), why only the parent process can handle SIGHUP signal?

sigwait() does not work in multithreaded program

I'm trying to write a multithreaded program which one thread (variable thread in below) is responsible to any asynchronous signals that might be set to this process.
I am facing thread that uses sigwait() but does not react to any signals have been sent to process. (like SIGUSR1 in below).
static void * signal_thread(void *arg = nullptr)
{
int sig = -1;
sigset_t sigset;
sigfillset(&sigset);
pthread_sigmask(SIG_BLOCK, &sigset, NULL);
while(1)
{
int s = sigwait(&sigset, &sig);
if(s == 0)
printf("SIG %d recieved!...\n", sig);
usleep(20);
}
}
int main()
{
sigset_t signalset;
pthread_t thread;
pthread_create(&thread, NULL, &signal_thread, nullptr);
sigfillset(&signalset);
pthread_sigmask(SIG_BLOCK, &signalset, NULL);
while(1)
{
raise(SIGUSR1);
usleep(20);
}
}
The problem is concerned to two issues:
First, call of raise in main sent signal only to main thread not whole process.
Secondly, std::cout should be used instead of printf in signal_thread.
raise(sig) is the equivalent of calling pthread_kill(pthread_self(), sig).
Since the main thread raise()s the signal, the SIGUSR1 will be generated for that thread and not for any other. Thus, your signal_thread will be unable to sigwait() for the USR1, which will be held pending for the thread that generated it.

How to wait two pthreads?

Would any one please tell me what happens between the last two code lines
// Creating Server and Client threads
pthread_create(&serverThread, NULL, (void* (*)(void*))&Server,(void *)0);
pthread_create(&clientThread, NULL, (void* (*)(void*))&Client,(void *)1);
// Wait until serverThread exits
pthread_join( serverThread, NULL);
// Wait until clientThread exits
pthread_join( clientThread, NULL);
I want to wait them simultaneously. What if one of the two threads terminates/exits? What if server kept running in an infinite loop?
The first join - pthread_join(serverThread, NULL); will wait until serverThread terminates.
The clientThread may or may not terminate during this time; if it terminates, it remains in zombie state until pthread_join(clientThread, NULL); gets called. pthread_join will return immediately in this case.
If clientThread has not yet finished execution when pthread_join(clientThread, NULL); is called, it will wait again until clientThread terminates.

Winapi Timer callback thread, never returns

I've got to debug some code which is not from me.
This code implement a timer API using winapi Timer interface.
I'm not very used to this Winapi functionality, so i could use your help :)
From what I understand this code is done like this :
=> Init()
timerQueue = CreateTimerQueue();
=> CreateTimer()
CreateTimerQueueTimer(timerHandle, timerQueue, timerCallback, ..., WT_EXECUTEDEFAULT);
=> timerCallback()
DeleteTimerQueueTimer(timerQueue , timerHandle, NULL));
calback() //Launch user-defined callback
=> CleanUp() // to be called at the end
DeleteTimerQueueEx(timerQueue , INVALID_HANDLE_VALUE);
When we test that, user-defined callback are executed successfully after the desired amount of time. But after that timerCallback threads keep pending and never return, preventing the all process to returns. Using VS debugger I can see those threads (named TppWorkerThread#4) on the thread...
Perhaps we miss something to make callback returns properly or we created some sort of deadlocks... However I cannot figure it out ...
Please let me know if I forgot some relevant information.
Thank you for your help.
EDIT:
Further information :
- Blocking thread are at this state at the end of the process :
* Category :Worker Thread
* Name : _TppWorkerThread#4
* Location : _ZwWaitForWorkViaWorkerFactory#8
* Priotity : Normal
EDIT2:
Having some more time to work on that strange behavior, I am now able to reproduce it in a standalone code.
#include <windows.h>
#include <stdio.h>
HANDLE gDoneEvent;
HANDLE hTimer[5];
HANDLE hTimerQueue = NULL;
HANDLE g_threadHandle;
void PeriodicCallback(void)
{
printf("Periodic routine called.\n");
}
void SingleCallback(void)
{
printf("Single routine called.\n");
if (!DeleteTimerQueueTimer(hTimerQueue, hTimer[2], NULL))
printf("DeleteTimerQueueTimer() fail. Return value is %d.\n", GetLastError());
}
void CALLBACK CommonCallback(PVOID lpParam, BOOLEAN TimerOrWaitFired)
{
printf("Common routine called. Parameter is %d.\n", *(int *)lpParam);
((void (*)(void))lpParam)();
}
void MainTest(void)
{
// Use an event object to track the TimerRoutine execution
gDoneEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (NULL == gDoneEvent)
{
printf("CreateEvent failed (%d)\n", GetLastError());
return -1;
}
if(0 == SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_BELOW_NORMAL))
{
printf("SetThreadPriority failed (%d)\n", GetLastError());
return -2;
}
// Create the timer queue.
hTimerQueue = CreateTimerQueue();
if (NULL == hTimerQueue)
{
printf("CreateTimerQueue failed (%d)\n", GetLastError());
return -3;
}
/*
if (!CreateTimerQueueTimer( &hTimer[2], hTimerQueue,
(WAITORTIMERCALLBACK)CommonCallback, &SingleCallback, 1000, 0, WT_EXECUTEDEFAULT))
{
printf("CreateTimerQueueTimer failed (%d)\n", GetLastError());
return -4;
}
*/
if (!CreateTimerQueueTimer( &hTimer[4], hTimerQueue,
(WAITORTIMERCALLBACK)CommonCallback, &PeriodicCallback, 10, 500, WT_EXECUTEDEFAULT))
{
printf("CreateTimerQueueTimer failed (%d)\n", GetLastError());
return -5;
}
// TODO: Do other useful work here
printf("Call timer routine in 10 seconds...\n");
Sleep(4000);
CloseHandle(gDoneEvent);
if (!DeleteTimerQueueTimer(hTimerQueue, hTimer[4], INVALID_HANDLE_VALUE))
printf("DeleteTimerQueueTimer failed (%d)\n", GetLastError());
// Delete all timers in the timer queue.
if (!DeleteTimerQueueEx(hTimerQueue, INVALID_HANDLE_VALUE))
printf("DeleteTimerQueue failed (%d)\n", GetLastError());
Sleep(1000);
ExitThread(0);
}
int main(int argc, char **argv[])
{
if(g_threadHandle == CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)MainTest, NULL, 0, NULL))
printf("Creation fail");
ExitThread(0);
}
I'm compiling this code on VisualStudio 2010 Professional.
It appears that event after calling DeleteTimerQueueTimer() some threads remain pending on the threads pool, preventing my process to shutdown. I still cannot figured it out ...
When you call DeleteTimerQueueEx with an INVALID_HANDLE_VALUE as its second parameter it will block until all callbacks which are running have completed. The Error may be in one of your callback functions which never returns.
You are calling DeleteTimerQueueTimer(timerQueue , timerHandle, NULL); with NULL as the third parameter, this will not wait for the callback to complete if one is running at the time you delete the timer. I suggest using DeleteTimerQueueTimer(timerQueue , timerHandle, INVALID_HANDLE_VALUE) which will block until the call back completes (if one is running). Calling cleanUp() without using the blocking version of DeleteTimerQueueTimer is likely a bug as you may be cleaning up at the same time as the callback is executing.
It could also be a problem of calling DeleteTimerQueueEx or DeleteTimerQueueTimer from within a callback, which is forbidden. Break on execution of DeleteTimerQueueEx and look at what thread you are in, if its a TppWorkerThread than you have found your bug.
EDIT:
In your comment you say you do call DeleteTimerQueueTimer from within the callback but don't use INVALID_HANDLE_VALUE, reading the documentation again from http://msdn.microsoft.com/en-us/library/windows/desktop/ms682569%28v=vs.85%29.aspx this does seem to be legal but I distinctly remember us making design decisions to avoid this, I'm sorry this is so vague, I hope someone can give authoritative advice on this.
We send an event/message to the queue of a non timer thread which then removes the timer, you could even have a dedicated thread for this but that is probably overkill. At the end of the day you need to be sure that the timer is removed before doing cleanup so you have to either block on removal or have some other thread do it upon signaling of an event.
After some work on that issue I think I got to an answer.
I appears that this timerQueue API is coded on top of threadPool winAPI, and when we ask to create a timerQueue Windows create a thread pool from where all callback will be launched.
Until here no problem, but, when we ask for timerQueue deletion, it appears that this thread pool is not deleted...
This result in some thread keeping pending waiting to be used and preventing the process to returns.
After some time (timeout??) those threads returns and the process exit.
I don't really get why this pool is not closed... but, now, I use a workaround :
exit(0);
At the end of my program, it's a bit brutal but it does the job (ie: killing my process, whatever threads are still pending or not)

I want to know which a signal is arrived when system call() is interrupted

My application has two threads. Each threads recevive some data from the server via each sockets. Threads wait to return epoll_wait(). Sometimes epoll_wait() returns -1 and errno is EINTR. EINTR means that system call() is interrupted by a signal. I added to process EINTR.
However I do not know what a signal is arrived and why a signal is arrived. I wonder it.
Method 1.
I created a thread.
sigset_t sMaskOfSignal;
sigset_t sOldMaskOfSignal;
sigfillset(&sMaskOfSignal);
sigprocmask(SIG_UNBLOCK, &sMaskOfSignal, &sOldMaskOfSignal)
while(1)
{
sigwait(&sMaskOfSignal, &sArrivedSignal);
fprintf(stdout, "%d(%s) signal caught\n", sArrivedSignal, strsignal(sArrivedSignal));
}
I could not catch a signal when epoll_wait() is interrupted.
Method 2
When I execute my application in strace tool, epoll_wait() never be interrupted.
My problem is reproduced very well in GDB tool. I need helps....
You can try to implement your own signal handler. If you application gets interrupted by a signal again, your own signal-handler will be called and you can see, what kind of signal has been raised.
void
signal_callback_handler(int signum)
{
printf("Caught signal %d\n",signum);
exit(signum); // terminate application
}
int main()
{
// Register signal handler for all signals you want to handle
signal(SIGINT, signal_callback_handler);
signal(SIGABRT, signal_callback_handler);
signal(SIGSEGV, signal_callback_handler);
// .. and even more, if you want to
}
Not a very handy-method, but this should (hopefully) enable you to find out, what signal has been raised. Take a look here to see the different signals, that can be handled (note: not all signals can be handled in your own signal-handler(!)).
May be you should try setting signal handler for catching all signals and set your signal flags to SA_SIGINFO
something like this
struct sigaction act;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
act.sa_sigaction = <handler>;
sigaction(SIGFPE, &act, 0);
sigaction(SIGHUP, &act, 0);
sigaction(SIGABRT, &act, 0);
sigaction(SIGILL, &act, 0);
sigaction(SIGALRM, &act, 0);
sigaction(SIGALRM, &act, 0);
.
.
.
//and your handler looks like
void handle_sig (int sig, siginfo_t *info, void *ptr)
{
printf ("Signal is %d\n",sig);
}
Resgister the handler in your main program and ignore EINTR in epoll.