Cancelling thread that is stuck on epoll_wait - c++

I'm doing some event handling with C++ and pthreads. I have a main thread that reads from event queue I defined, and a worker thread that fills the event queue. The queue is of course thread safe.
The worker thread have a list of file descriptors and create an epoll system call to get events on those file descriptors. It uses epoll_wait to wait for events on the fd's.
Now the problem. Assuming I want to terminate my application cleanly, how can I cancel the worker thread properly? epoll_wait is not one of the cancellation points of pthread(7) so it cannot react properly on pthread_cancel.
The worker thread main() looks like this
while(m_WorkerRunning) {
epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
//handle events and insert to queue
}
The m_WorkerRunning is set to true when the thread starts and it looks like I can interrupt the thread by settings m_WorkerRunning to false from the main thread. The problem is that epoll_wait theoretically can wait forever.
Other solution I though about is: instead of waiting forever (-1) I can wait for example X time slots, then handle properly no-events case and if m_WorkerRunning == false then exit the loop and terminate the worker thread cleanly. The main thread then sets m_WorkerRunning to false, and sleeps X. However I'm not sure about the performance of such epoll_wait and also not sure what would be the correct X? 500ms? 1s? 10s?
I'd like to hear some experienced advises!
More relevant information: the fd's I'm waiting events on, are devices in /dev/input so technically I'm doing some sort of input subsystem. The targeted OS is Linux (latest kernel) on ARM architecture.
Thanks!

alk's answer above is almost correct. The difference, however, is very dangerous.
If you are going to send a signal in order to wake up epoll_wait, never use epoll_wait. You must use epoll_pwait, or you might run into a race with your epoll never waking up.
Signals arrive asynchronously. If your SIGUSR1 arrives after you've checked your shutdown procedure, but before your loop returns to the epoll_wait, then the signal will not interrupt the wait (as there is none), but neither will the program exit.
This might be very likely or extremely unlikely, depending on how long the loop takes in relation to how much time is spent in the wait, but it is a bug one way or the other.
Another problem with alk's answer is that it does not check why the wait was interrupted. It might be any number of reasons, some unrelated to your exit.
For more information, see the man page for pselect. epoll_pwait works in a similar way.
Also, never send signals to threads using kill. Use pthread_kill instead. kill's behavior when sending signals is, at best, undefined. There is no guarantee that the correct thread will receive it, which might cause an unrelated system call to be interrupted, or nothing at all to happen.

You could send the thread a signal which would interupt the blocking call to epoll_wait(). If doing so modify your code like this:
while(m_WorkerRunning)
{
int result = epoll_wait(m_EpollDescriptor, events, MAXEVENTS, -1);
if (-1 == result)
{
if (EINTR == errno)
{
/* Handle shutdown request here. */
break;
}
else
{
/* Error handling goes here. */
}
}
/* Handle events and insert to queue. */
}
A way to add a signal handler:
#include <signal.h>
/* A generic signal handler doing nothing */
void signal_handler(int sig)
{
sig = sig; /* Cheat compiler to not give a warning about an unused variable. */
}
/* Wrapper to set a signal handler */
int signal_handler_set(int sig, void (*sa_handler)(int))
{
struct sigaction sa = {0};
sa.sa_handler = sa_handler;
return sigaction(sig, &sa, NULL);
}
To set this handler for the signal SIGUSR1 do:
if (-1 == signal_handler_set(SIGUSR1, signal_handler))
{
perror("signal_handler_set() failed");
}
To send a signal SIGUSR1 from another process:
if (-1 == kill(<target process' pid>, SIGUSR1))
{
perror("kill() failed");
}
To have a process send a signal to itself:
if (-1 == raise(SIGUSR1))
{
perror("raise() failed");
}

Related

How to kill/signal other child processes from another child process?

I am having a bit of trouble figuring out exactly how the kill(pid_t pid, int sig) function works when using it in a child process. In my program, I have the parent, and 8 child processes created with fork(). I've been searching and reading online to no avail unfortunately. All of the searched yield results about killing children from the parent process.
I have signal handlers set up in each process, except they are not working correctly.
Basically, I need to signal all of the processes in the process group from the child process "signal_generating_process", but for some reason the signals are not going through correctly.
On the man page, kill(2) says that if I use 0 as the first argument, it will send the signal to all processes in the process group, but its not working correctly for me. I'll include the code for the signal generator as well as one of the signal handlers. Feel free to ask for more information if I haven't included enough. Thank you all very much!
void signal_generating_process(){
signal(SIGINT, end_process_handler);
block_sigusr1();
block_sigusr2();
while(true){
millisleep(randomFloat(.01,.1)); //function I created to sleep for a certain amount of milliseconds
int sig = rand_signal(); //randomly picks between sigusr1 and sigusr2
kill(0, sig);
if(sig == SIGUSR1){ //adds to a counter for sigusr1
pthread_mutex_lock(&shm_ptr->mutex1_sent);
shm_ptr->SIGUSR1_sent++;
pthread_mutex_unlock(&shm_ptr->mutex1_sent);
}
else{ //signal == SIGUSR2 - adds to a counter for sigusr2
pthread_mutex_lock(&shm_ptr->mutex2_sent);
shm_ptr->SIGUSR2_sent++;
pthread_mutex_unlock(&shm_ptr->mutex2_sent);
}
}
}
Process that handles sigusr1 and the signal handler (sigusr2 is the same):
void sigusr1_receiving_process(){
block_sigusr2();
signal(SIGUSR1, sigusr1_handler);
signal(SIGINT, end_process_handler);
while(true){
sleep(1);
}
}
void sigusr1_handler(int signal){
printf("Signal 1 Received\n");
if(signal == SIGUSR1){
pthread_mutex_lock(&shm_ptr->mutex1_received);
shm_ptr->SIGUSR1_received++;
pthread_mutex_lock(&shm_ptr->mutex1_received);
}
}
When these loops go through, "Signal 1 Received" is never printed throughout the course of the entire execution. Is there anything that you can tell is obviously wrong with how I'm handling signals?
Edit: I fixed my problem! Unfortunately, it had nothing to do with what I have above, so I apologize for people who find this question in the future looking for an answer.
Anyway, if you do stumble upon it, maybe it has to do with the way you block signals. I blocked signals incorrectly in the parent process, so they transferred over into the child processes. If you're having this issue, maybe check how you have blocked signals.

winsock2 trying to interrupt recvfrom() in worker thread by calling closesocket(..) from main

I have a UDP multicast receiver that will start an std::thread then detach() it to handle the receiving of messages, and it should be able to interrupt the blocking call when called upon.
void startThread()
{
if(flagIsNotset)
{
//set flag
std::thread t(&receiveData, this, socketDetails);
t.detach();
}
}
void receiveData(socketDetails)
{
//initilaise socket
//bind socket to interface
//binds multicast address & port to interface
while(true)
{
//recvfrom(...)
//handle received data
}
//prints a message to know that the thread has exited the while loop
//unset flag
}
void interrupt()
{
//unset flag is shifted here after realizing it did not reach the end of receiveData() when closesocket was called for the first time.
closesocket(s);
}
everything was fine, I can receive UDP messages etc. but when interrupt is first called: it does not reach the message printing part & setting flag portion. when I shift the //set flag codes to interrupt(), I can confirm that for the first time (of calling interrupt() then startThread()), the message is not printed but on subsequent calls, it reached the end of the receiveData function as intended.
My question are as follows :
1) What happened to the first thread that was detached? is it stuck in recvfrom forever and was not interrupted by closesocket() ?
2) Will this cause memory leaking ? that is my primary concern...as I am unable to figure out the state of the first thread that was detached, will setting the thread t as a private member and calling the t::~thread() in interrupt() helps to ensure that the thread is terminated when interrupt is called?
3) Repeating startThread() and interrupt() does not seems to create a duplicate thread that is listening from the same socket. (i.e always receive one copy of the multicast message)
4) If there is any suggestions to improve this code, please feel free to comment.

C++: Thread synchronization scenario on Linux Platform

I am implementing multithreaded C++ program for Linux platform where I need a functionality similar to WaitForMultipleObjects().
While searching for the solution I observed that there are articles that describe how to achieve WaitForMultipleObjects() functionality in Linux with examples but those examples does not satisfy the scenario that I have to support.
The scenario in my case is pretty simple. I have a daemon process in which the main thread exposes a method/callback to the outside world for example to a DLL. The code of the DLL is not under my control. The same main thread creates a new thread "Thread 1". Thread 1 has to execute kind of an infinite loop in which it would wait for a shutdown event (daemon shutdown) OR it would wait on the data available event being signaled through the exposed method/callback mentioned above.
In short the thread would be waiting on shutdown event and data available event where if shutdown event is signaled the wait would satisfy and the loop would be broken or if data available event is signaled then also wait would satisfy and thread would do business processing.
In windows, it seems very straight forward. Below is the MS Windows based pseudo code for my scenario.
//**Main thread**
//Load the DLL
LoadLibrary("some DLL")
//Create a new thread
hThread1 = __beginthreadex(..., &ThreadProc, ...)
//callback in main thread (mentioned in above description) which would be called by the DLL
void Callbackfunc(data)
{
qdata.push(data);
SetEvent(s_hDataAvailableEvent);
}
void OnShutdown()
{
SetEvent(g_hShutdownEvent);
WaitforSingleObject(hThread1,..., INFINITE);
//Cleanup here
}
//**Thread 1**
unsigned int WINAPI ThreadProc(void *pObject)
{
while (true)
{
HANDLE hEvents[2];
hEvents[0] = g_hShutdownEvent;
hEvents[1] = s_hDataAvailableEvent;
//3rd parameter is set to FALSE that means the wait should satisfy if state of any one of the objects is signaled.
dwEvent = WaitForMultipleObjects(2, hEvents, FALSE, INFINITE);
switch (dwEvent)
{
case WAIT_OBJECT_0 + 0:
// Shutdown event is set, break the loop
return 0;
case WAIT_OBJECT_0 + 1:
//do business processing here
break;
default:
// error handling
}
}
}
I want to implement the same for Linux. According to my understanding when it would come to Linux, it has totally different mechanism where we need to register for signals. If the termination signal arrives, the process would come to know that it is about to shutdown but before that it is necessary for the process to wait for the running thread to gracefully shutdown.
The correct way to do this in Linux would be using condition variables. While this is not the same as WaitForMultipleObjects in Windows, you will get the same functionality.
Use two bools to determine whether there is data available or a shutdown must occur.
Then have the shutdown function and the data function both set the bools accordingly, and signal the condition variable.
#include <pthread.h>
pthread_cond_t cv = PTHREAD_COND_INITIALIZER;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_t hThread1; // this isn't a good name for it in linux, you'd be
// better with something line "tid1" but for
// comparison's sake, I've kept this
bool shutdown_signalled;
bool data_available;
void OnShutdown()
{
//...shutdown behavior...
pthread_mutex_lock(&mutex);
shutdown_signalled = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cv);
}
void Callbackfunc(...)
{
// ... whatever needs to be done ...
pthread_mutex_lock(&mutex);
data_available = true;
pthread_mutex_unlock(&mutex);
pthread_cond_signal(&cv);
}
void *ThreadProc(void *args)
{
while(true){
pthread_mutex_lock(&mutex);
while (!(shutdown_signalled || data_available)){
// wait as long as there is no data available and a shutdown
// has not beeen signalled
pthread_cond_wait(&cv, &mutex);
}
if (data_available){
//process data
data_available = false;
}
if (shutdown_signalled){
//do the shutdown
pthread_mutex_unlock(&mutex);
return NULL;
}
pthread_mutex_unlock(&mutex); //you might be able to put the unlock
// before the ifs, idk the particulars of your code
}
}
int main(void)
{
shutdown_signalled = false;
data_available = false;
pthread_create(&hThread1, &ThreadProc, ...);
pthread_join(hThread1, NULL);
//...
}
I know windows has condition variables as well, so this shouldn't look too alien. I don't know what rules windows has about them, but on a POSIX platform the wait needs to be inside of a while loop because "spurious wakeups" can occur.
If you wish to write unix or linux specific code, you have differenr APIs available:
pthread: provides threads, mutex, condition variables
IPC (inter process comunication) mechanisms : mutex, semaphore, shared memory
signals
For threads, the first library is mandatory (there are lower level syscalls on linux, but it's more tedious). For events, the three may be used.
The system shutdown event generate termination (SIG_TERM) and kill (SIG_KILL) signals broadcasted to all the relevant processes. Hence an individual daemon shutdown can also be initiated this way. The goal of the game is to catch the signals, and initiate process shutdown. The important points are:
the signal mechanism is made in such a way that it is not necessary to wait for them
Simply install a so called handler using sigaction, and the system will do the rest.
the signal is set to the process, and any thread may intercept it (the handler may execute in any context)
You need therefore to install a signal handler (see sigaction(2)), and somehow pass the information to the other threads that the application must terminate.
The most convenient way is probably to have a global mutex protected flag which all your threads will consult regularily. The signal handler will set that flag to indicate shutdown. For the worker thread, it means
telling the remote host that the server is closing down,
close its socket on read
process all the remaining received commands/data and send answers
close the socket
exit
For the main thread, this will mean initiating a join on the worker thread, then exit.
This model should not interfer with the way data is normally processed: a blocking call to select or poll will return the error EINTR if a signal was caught, and for a non blocking call, the thread is regularily checking the flag, so it does work too.

Child process receives parent's SIGINT

I have one simple program that's using Qt Framework.
It uses QProcess to execute RAR and compress some files. In my program I am catching SIGINT and doing something in my code when it occurs:
signal(SIGINT, &unix_handler);
When SIGINT occurs, I check if RAR process is done, and if it isn't I will wait for it ... The problem is that (I think) RAR process also gets SIGINT that was meant for my program and it quits before it has compressed all files.
Is there a way to run RAR process so that it doesn't receive SIGINT when my program receives it?
Thanks
If you are generating the SIGINT with Ctrl+C on a Unix system, then the signal is being sent to the entire process group.
You need to use setpgid or setsid to put the child process into a different process group so that it will not receive the signals generated by the controlling terminal.
[Edit:]
Be sure to read the RATIONALE section of the setpgid page carefully. It is a little tricky to plug all of the potential race conditions here.
To guarantee 100% that no SIGINT will be delivered to your child process, you need to do something like this:
#define CHECK(x) if(!(x)) { perror(#x " failed"); abort(); /* or whatever */ }
/* Block SIGINT. */
sigset_t mask, omask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
CHECK(sigprocmask(SIG_BLOCK, &mask, &omask) == 0);
/* Spawn child. */
pid_t child_pid = fork();
CHECK(child_pid >= 0);
if (child_pid == 0) {
/* Child */
CHECK(setpgid(0, 0) == 0);
execl(...);
abort();
}
/* Parent */
if (setpgid(child_pid, child_pid) < 0 && errno != EACCES)
abort(); /* or whatever */
/* Unblock SIGINT */
CHECK(sigprocmask(SIG_SETMASK, &omask, NULL) == 0);
Strictly speaking, every one of these steps is necessary. You have to block the signal in case the user hits Ctrl+C right after the call to fork. You have to call setpgid in the child in case the execl happens before the parent has time to do anything. You have to call setpgid in the parent in case the parent runs and someone hits Ctrl+C before the child has time to do anything.
The sequence above is clumsy, but it does handle 100% of the race conditions.
What are you doing in your handler? There are only certain Qt functions that you can call safely from a unix signal handler. This page in the documentation identifies what ones they are.
The main problem is that the handler will execute outside of the main Qt event thread. That page also proposes a method to deal with this. I prefer getting the handler to "post" a custom event to the application and handle it that way. I posted an answer describing how to implement custom events here.
Just make the subprocess ignore SIGINT:
child_pid = fork();
if (child_pid == 0) {
/* child process */
signal(SIGINT, SIG_IGN);
execl(...);
}
man sigaction:
During an execve(2), the dispositions of handled signals are reset to the default;
the dispositions of ignored signals are left unchanged.

ACE Reactor quits on interrupted system call

I have an ACE reactor that accepts socket connections and listens for the incoming data on those connections. The reactor runs in a dedicated thread. This is the thread's entry function:
int TcpServer::svc()
{
LogDebug("The TCP server on %i is running", mLocalAddr.get_port_number());
// The current thread will own the reactor. By default, a reactor is owned by
// the creating thread. A reactor cannot run from not owning thread.
if (mReactor.owner(ACE_Thread::self()) != 0)
{
LogThrow("Could not change the owner of the reactor");
}
if (mReactor.run_reactor_event_loop() != 0)
{
LogWarning("Reactor loop has quit with an error.");
}
return 0;
}
Once in a while run_reactor_event_loop exits with -1 and errno reports that the reason is "interrupted system call". How can I handle the situation? From what I know I have two options: call run_reactor_event_loop again or configure the interrupted call to be called again using sigaction and SA_RESTART.
Is it safe to call run_reactor_event_loop again?
What does ACE_Reactor::restart method do? It looks like it is supposed to restart the loop? Will it help?
How safe it to turn on SA_RESTART? Does it mean, for example, that ^C won't stop my application?
Are there any other ways to handle the situation?
Check how Reactor is constructed. ACE_Reactor::open() cal, takes "restart" parameter (default = false) that tells it to restart handle_events method automatically after interruption.