Not able to catch SIGINT signal while using select() - c++

I'm trying to handle signals while listen socket in syscall select.
Problem: I have the working loop with select call. select waits for socket descriptor is ready.
There is need to break loop by SIGINT or SIGQUIT and correct close resources and exit the programm.
Below code is
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <string.h>
#include <error.h>
#include <errno.h>
#include <signal.h>
#include <syslog.h>
#include <sys/socket.h>
#include <fcntl.h>
#include <netinet/in.h>
bool bBreakJob = false;
void sig_handler(int sig)
{
switch(sig)
{
case SIGHUP:
//rneed to reload config
break;
case SIGINT:
printf("SIGINT \n");
bBreakJob = true;
openlog("mydaemon", LOG_PID | LOG_CONS, LOG_DAEMON);
syslog(LOG_INFO, "Catched SIGINT");
closelog();
break;
case SIGQUIT:
printf("SIGQUIT \n");
openlog("mydaemon", LOG_PID | LOG_CONS, LOG_DAEMON);
syslog(LOG_INFO, "Catched SIGQUIT");
bBreakJob = true;
break;
case SIGPIPE:
printf("SIGPIPE \n");
break;
}
}
int main(int argc, char** argv)
{
struct sigaction act, oact;
sigset_t set;
sigemptyset(&set);
sigaddset(&set, SIGINT);
sigaddset(&set, SIGHUP);
sigaddset(&set, SIGPIPE);
sigaddset(&set, SIGQUIT);
sigprocmask(SIG_UNBLOCK, &set, NULL);
act.sa_mask = set;
act.sa_handler = sig_handler;
act.sa_flags = 0;
sigaction(SIGINT, &act, NULL);
sigaction(SIGHUP, &act, NULL);
sigaction(SIGPIPE, &act, NULL);
sigaction(SIGQUIT, &act, NULL);
int fds[0], res, fmax;
fd_set wset;
fd_set rset;
//next line code to open socket
int listen_socket = socket(AF_INET, SOCK_STREAM, 0);
int iFlags = fcntl(listen_socket, F_GETFL);
iFlags |= O_NONBLOCK;
fcntl(listen_socket, F_SETFL, iFlags);
struct sockaddr_in sin;
memset(&sin, 0, sizeof(sin));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_port = htons(4000);
bind(listen_socket, (struct sockaddr *)&sin, sizeof(sin));
listen(listen_socket, 20);
fds[0] = listen_socket;
FD_ZERO(&wset);
FD_SET(fds[0], &wset);
fmax = fds[0] + 1;
while (FD_ISSET(fds[0], &wset))
{
rset = wset;
res = select(fmax, &rset, NULL, NULL, NULL);
if (res < 0)
{
if (errno == EINTR)
{ //debug message
printf("Loop broken by select's result EINTR");
break;
} else
{
printf("select(...) fails in listed loop. errno %d (%s)", errno, strerror(errno));
exit(1);
}
}
else if (res == 0)
{
//if timeout is handled
}
else if (res > 0)
{
if (FD_ISSET(fds[0], &rset))
{
//Handle socket input
}
}
if(bBreakJob)
{
printf("Loop broken by signal handler");
break;
}
} //while( 1 );
FD_CLR(fds[0], &wset);
if(bBreakJob)
{ //debug message
printf("signal SIGINT is handled ");
}
}
SIGINT never reaches the sig_handler. In IDE QtCreator I've tried to debug. select just interrupted and then return to listen. The condition "if (errno == EINTR)" is not reached even. Ther is no debug messages either in console either no in syslog. And in the same time SIGQUIT works fine: sig_handler is called and the condition "if (errno == EINTR)" is reached too.
As you can see, I've tried to check SIGINT in to ways: with flag from signal handler, and from result of select
I've tried to found answer in topic Not able to catch SIGINT signal while using select(). But cannot found the solution. This problem I meet in other WEB-resources, but there is no solution too.
SIGINT signal is sened from command line: "kill -s 2 (PID)"
UPD Problem has solved. The issue was in debugger. Under debugger SIGINT does not working properly. Running the programm without debugger working fine as expected.

The interaction of select and signals is tricky, because the signal could always arrive right before you call select. Here are two ways to wake up a select loop from a signal handler:
The "self-pipe trick": Create a pipe and add the read end to your select read set. From your signal handler, write one byte to the write end of this pipe, and it will make the select return immediately (because input is ready).
Rather than pass NULL as the final argument to select, pass a pointer to a timeval that is a global variable. Within your signal handler, make the timeval 0 seconds. Hence, if a signal arrives before you call select, select will ask for a 0 timeout and return immediately.

Related

How to terminate windows sockets when internet is down? (C++ WinAPI)

I have set up a Winsock2 connection but I need to cover the case where internet is down. Here is my code;
#include <winsock2.h>
#include <windows.h>
#include <ctime>
int main()
{
WSADATA w;
if(WSAStartup(MAKEWORD(2,2),&w)) return 0;
sockaddr_in sad;
sad.sin_family=AF_INET;
sad.sin_addr.s_addr=inet_addr("200.20.186.76");
sad.sin_port=htons(123);
sockaddr saddr;
int saddr_l=sizeof(saddr);
int s=socket(PF_INET,SOCK_DGRAM,IPPROTO_UDP);
if(s==INVALID_SOCKET) return 0;
char msg[48]={8};
if(sendto(s,msg,sizeof(msg),0,(sockaddr*)&sad,sizeof(sad))==SOCKET_ERROR) return 0;
if(recvfrom(s,msg,48,0,&saddr,&saddr_l)==SOCKET_ERROR) return 0;
if(closesocket(s)==SOCKET_ERROR) return 0;
if(WSACleanup()) return 0;
return 0;
}
Here it waits for the call to return as it's documented. I have two questions.
Can I set a timeout like we can do when using select
How else can I prevent the waiting and make it return immediately? Documentation states that:
When issuing a blocking Winsock call such as sendto, Winsock may need to wait for a network event before the call can complete. Winsock performs an alertable wait in this situation, which can be interrupted by an asynchronous procedure call (APC) scheduled on the same thread.
How to do that?
If you want to issue a recvfrom() and have it return immediately, then decide on your own how long to wait (I'm assuming Windows since you included winsock2.h), you can make an asynchronous OVERLAPPED request, then wait for the completion at any time by waiting for the hEvent member of the OVERLAPPED struct to be signaled.
Here's an updated sample based off your original code.
you set the timeout by waiting as long as you need with WaitForSingleObject (below I wait for 10 seconds 6 times)
by passing an OVERLAPPED pointer, you are indicating that you will wait for the completion yourself. Note that the OVERLAPPED struct cannot go out of scope until the hEvent is signaled. (or freed, if the OVERLAPPED was dynamically allocated).
Letting the OVERLAPPED go out of scope before guaranteeing the IO completed is a common Winsock bug (I've been working on Winsock for over 10 years or so - I've seen many variations of this bug)
As commented below, if you don't know hEvent has been signaled, then after calling closesocket you must wait for hEvent to be signaled before continuing - closesocket does not guarantee all asynchronous IO request have completed before returning.
#define _WINSOCK_DEPRECATED_NO_WARNINGS
#include <winsock2.h>
#include <windows.h>
#include <ctime>
int main()
{
WSADATA w;
if (WSAStartup(MAKEWORD(2, 2), &w)) return 0;
sockaddr_in sad;
sad.sin_family = AF_INET;
sad.sin_addr.s_addr = inet_addr("200.20.186.76");
sad.sin_port = htons(123);
sockaddr saddr;
int saddr_l = sizeof(saddr);
int s = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (s == INVALID_SOCKET) return 0;
char msg[48] = { 8 };
if (sendto(s, msg, sizeof(msg), 0, (sockaddr*)&sad, sizeof(sad)) == SOCKET_ERROR) return 0;
OVERLAPPED ov{};
ov.hEvent = CreateEvent(nullptr, TRUE, FALSE, nullptr);
if (ov.hEvent == nullptr) return 0;
WSABUF wsabuffer{};
wsabuffer.buf = msg;
wsabuffer.len = 48;
DWORD flags = 0;
if (WSARecvFrom(s, &wsabuffer, 1, nullptr, &flags, &saddr, &saddr_l, &ov, nullptr) == SOCKET_ERROR)
{
DWORD gle = WSAGetLastError();
if (gle != WSA_IO_PENDING) return 0;
}
for (DWORD recv_count = 0; recv_count < 6; ++recv_count)
{
DWORD wait = WaitForSingleObject(ov.hEvent, 10000);
if (wait == WAIT_FAILED) return 0;
if (wait == WAIT_OBJECT_0) break; // WSARecvFrom completed
if (wait == WAIT_TIMEOUT) continue; // WSARecvFrom is still pended waiting for data
}
// assuming WSARecvFrom completed - i.e. ov.hEvent was signaled
DWORD transferred;
if (WSAGetOverlappedResult(s, &ov, &transferred, FALSE, &flags))
{
// WSARecvFrom completed successfully - 'transferred' shows the # of bytes that were received
}
else
{
DWORD gle = WSAGetLastError();
gle;
// WSARecvFrom failed with the error code in 'gle'
}
if (closesocket(s) == SOCKET_ERROR) return 0;
// with real code, we must guarantee that hEvent is set after calling closesocket
// e.g. if we get here in an error path
// closesocket() won't guarantee all async IO has completed before returning
WaitForSingleObject(ov.hEvent, INFINITE);
if (WSACleanup()) return 0;
return 0;
}

how to detect eventfd file descriptor close by other program

I have a client/server communicate through eventfd. If either client or server call close(fd) I would like the other end to find out (like file descriptor is closed now). I tried to use select with non-zero timeout, it always return 0 which is timeout. I saw people suggesting use fcntl it doesn't seems to be working either. Any suggestions?
Addtion Details (omitted non important part code, you can see here for how to exchange file descriptor detail code:
It is multi processes application. Server process created eventfd by calling
struct msghdr control_message;
int fd = eventfd(0,0);
*CMSG_DATA(control_message) = fd;
message.msg_control = &control_message;
sendmsg(socket_fd, & message,0); //send this to client
From client side:
recvmsg(socket_fd, & message,0);
//loop using CMSG_NXTHDR(&message, control_message)
int fd = *(int *) CMSG_DATA(contro_message);
Then on server side:
close(fd);
On Client side:
int rc;
rc = dup2(fd,fd);
rc is never invalid.
Checking for a closed file descriptor? How about this?
#include <errno.h>
#include <stdio.h>
#include <string.h>
static void checkit ()
{
int rc;
rc = dup2(2, 2);
if ( rc == -1 )
printf("error %d on dup2(2, 2): %s\n", errno, strerror(errno));
else
printf("dup2 successful\n");
write(2, "still working\n", 14);
}
int main (int argc, char **argv)
{
int rc;
checkit();
close(2);
checkit();
return 0;
}
Running it generates this output:
dup2 successful
still working
error 9 on dup2(2, 2): Bad file descriptor
If this is a multi-threaded application using poll and you want poll to return when the file descriptor is closed by another thread, POLLERR, POLLHUP, or POLLNVAL might help.
Multi-Threaded Version using Poll
And here's a sample that shows how to detect a closed fd with poll (POLLNVAL is the event) in a multi-threaded program:
#include <errno.h>
#include <poll.h>
#include <pthread.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
static void *run_poll (void *arg)
{
struct pollfd fds[1];
int rc;
fds[0].fd = 2;
fds[0].events = POLLERR | POLLHUP | POLLNVAL;
//
// Poll indefinitely
//
printf("starting poll\n");
fflush(stdout);
rc = poll((struct pollfd *) &fds, 1, -1);
if ( rc == -1 )
{
printf("POLL returned error %d: %s\n", errno, strerror(errno));
}
else
{
printf("POLL returned %d (revents = 0x%08x): %s %s %s\n",
rc,
fds[0].revents,
( ( fds[0].revents & POLLERR ) ? "pollerr" : "noerr" ),
( ( fds[0].revents & POLLHUP ) ? "pollhup" : "nohup" ),
( ( fds[0].revents & POLLNVAL ) ? "pollnval" : "nonval" ));
}
return NULL;
}
int main (int argc, char **argv)
{
pthread_t thread;
int rc;
rc = pthread_create(&thread, NULL, run_poll, NULL);
usleep(100);
printf("closing stderr\n");
close(2);
usleep(100);
return 0;
}
This generates the output
starting poll
closing stderr
POLL returned 1 (revents = 0x00000020): noerr nohup pollnval

Can I use select to combine stdin and accept?

I am trying to implement a server in C++/Linux that regularly takes user input from the terminal. Initially I had implemented two separate threads to handle this behavior. But I realized that I would need something like pthread_cancel to cancel the server thread in case the user wanted to shut down the server.
I then decided that it might be better to handle both actions in the same thread, so I dont have to worry about resource leakage. So what I have now is a 'select' call that selects over the stdin fd as well as my accepting fd. My code looks something like this...
fdset readfds;
FD_SET(acceptfd, &readfds);
FD_SET(stdinfd, &readfds);
while(1) {
select(n, &readfds, NULL, NULL, NULL);
....
}
For some reason I am no longer able to read input from stdin. This works fine when I remove either one of the two fds from my fd set, the other ome performs as expected. But when I leave them both in, while the acceptfd still accepts incoming connections, the stdinfd fails to respond to terminal input.
Does anyone know what I might be doing wrong here? Is this approach inherently flawed? Should I be focusing on keeping the two actions as separate threads and figuring out a way to exit cleanly instead?
Thanks for reading!!
As Ambroz commented, multiplexing stdin and some listened fd is possible.
But select is an old, nearly obsolete syscall, you should prefer using poll(2). If you insist on still using select(2) syscall, you should clear the readfds at first with FD_ZERO inside the loop. And the FD_SET macros should be inside the while loop, because select is permitted to modify the readfds.
The poll syscall is preferable to select because select impose a wired-in limit to the number of file descriptors the process can have (typically 1024, while the kernel is today able to deal with a bigger number of fds, eg 65536). In other words, select requires that every fd is < 1024 (which is false today). poll is able to deal with any set of any fd. The first argument to poll is an array (which you could calloc if you wanted to) whose size is the number of fds you want to multiplex. In your case, it is two (stdin and the second listened fd), so you can make it a local variable. Be sure to clear and initialize it before every call to poll.
You could debug with a debugger like gdb or just use strace
This epoll code works for me:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/epoll.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#define PORT 4711
int main(void) {
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(PORT);
addr.sin_addr.s_addr = htons(INADDR_ANY);
bind(sockfd, (struct sockaddr*) &addr, sizeof (addr));
listen(sockfd, 10);
int epollfd = epoll_create1(0);
struct epoll_event event;
// add stdin
event.events = EPOLLIN|EPOLLPRI|EPOLLERR;
event.data.fd = STDIN_FILENO;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, STDIN_FILENO, &event) != 0) {
perror("epoll_ctr add stdin failed.");
return 1;
}
// add socket
event.events = EPOLLIN|EPOLLPRI|EPOLLERR;
event.data.fd = sockfd;
if (epoll_ctl(epollfd, EPOLL_CTL_ADD, sockfd, &event) != 0) {
perror("epoll_ctr add sockfd failed.");
return 1;
}
char *line = NULL;
size_t linelen = 0;
for (;;) {
int fds = epoll_wait(epollfd, &event, 1, -1);
if (fds < 0) {
perror("epoll_wait failed.");
return 2;
}
if (fds == 0) {
continue;
}
if (event.data.fd == STDIN_FILENO) {
// read input line
int read = getline(&line, &linelen, stdin);
if (read < 0) {
perror("could not getline");
return 3;
}
printf("Read: %.*s", read, line);
} else if (event.data.fd == sockfd) {
// accept client
struct sockaddr_in client_addr;
socklen_t addrlen = sizeof (client_addr);
int clientfd = accept(sockfd, (struct sockaddr*) &client_addr, &addrlen);
if (clientfd == -1) {
perror("could not accept");
return 4;
}
send(clientfd, "Bye", 3, 0);
close(clientfd);
} else {
// cannot happen™
fprintf(stderr, "Bad fd: %d\n", event.data.fd);
return 5;
}
}
close(epollfd);
close(sockfd);
return 0;
}

Stopping a receiver thread that blocks on recv()

I have a chat application that has a separate thread to listen for incoming messages.
while (main thread not calling for receiver to quit) {
string message = tcpCon.tcpReceive(); // Relies on the recv() function
processIncomingMessage(message);
}
This way of working has one big problem. Most of the time, the loop will be blocking on recv() so the receiver thread won't quit. What would be a proper way to tackle this issue without forcing thread termination after a couple of seconds?
Close the socket with shutdown() to close it for all receivers.
This prints out 'recv returned 0' on my system, indicating that the receiver saw an orderly shutdown. Comment out shutdown() and watch it hang forever.
Longer term, OP should fix the design, either using select or including an explicit quit message in the protocol.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <pthread.h>
/* Free on my system. YMMV */
int port = 7777;
int cd;
void *f(void *arg)
{
/* Hack: proper code would synchronize here */
sleep(1);
/* This works: */
shutdown(cd, SHUT_RDWR);
close(cd);
return 0;
}
int main(void)
{
/* Create a fake server which sends nothing */
int sd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in sa = { 0 };
const int on = 1;
char buf;
pthread_t thread;
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = htonl(INADDR_ANY);
sa.sin_port = htons(port);
setsockopt(sd, SOL_SOCKET, SO_REUSEADDR, &on, sizeof on);
/* Other error reporting omitted for clarity */
if (bind(sd, (const struct sockaddr*)&sa, sizeof sa) < 0) {
perror("bind");
return EXIT_FAILURE;
}
/* Create a client */
listen(sd, 1);
cd = socket(AF_INET, SOCK_STREAM, 0);
connect(cd, (const struct sockaddr*)&sa, sizeof sa);
accept(sd, 0, 0);
/* Try to close socket on another thread */
pthread_create(&thread, 0, f, 0);
printf("recv returned %d\n", recv(cd, &buf, 1, 0));
pthread_join(thread, 0);
return 0;
}
You could use select() to wait for incoming data and avoid blocking in recv(). select() will also block, but you can have it time out after a set interval so that the while loop can continue and check for signals to quit from the main thread:
while (main thread not calling for receiver to quit) {
if (tcpCon.hasData(500)) { // Relies on select() to determine that data is
// available; times out after 500 milliseconds
string message = tcpCon.tcpReceive(); // Relies on the recv() function
processIncomingMessage(message);
}
}
If you close the socket in another thread, then recv() will exit.
calling close on the socket from any other thread will make the recv call fail instantly.

Waitpid equivalent with timeout?

Imagine I have a process that starts several child processes. The parent needs to know when a child exits.
I can use waitpid, but then if/when the parent needs to exit I have no way of telling the thread that is blocked in waitpid to exit gracefully and join it. It's nice to have things clean up themselves, but it may not be that big of a deal.
I can use waitpid with WNOHANG, and then sleep for some arbitrary time to prevent a busy wait. However then I can only know if a child has exited every so often. In my case it may not be super critical that I know when a child exits right away, but I'd like to know ASAP...
I can use a signal handler for SIGCHLD, and in the signal handler do whatever I was going to do when a child exits, or send a message to a different thread to do some action. But using a signal handler obfuscates the flow of the code a little bit.
What I'd really like to do is use waitpid on some timeout, say 5 sec. Since exiting the process isn't a time critical operation, I can lazily signal the thread to exit, while still having it blocked in waitpid the rest of the time, always ready to react. Is there such a call in linux? Of the alternatives, which one is best?
EDIT:
Another method based on the replies would be to block SIGCHLD in all threads with pthread \ _sigmask(). Then in one thread, keep calling sigtimedwait() while looking for SIGCHLD. This means that I can time out on that call and check whether the thread should exit, and if not, remain blocked waiting for the signal. Once a SIGCHLD is delivered to this thread, we can react to it immediately, and in line of the wait thread, without using a signal handler.
Don't mix alarm() with wait(). You can lose error information that way.
Use the self-pipe trick. This turns any signal into a select()able event:
int selfpipe[2];
void selfpipe_sigh(int n)
{
int save_errno = errno;
(void)write(selfpipe[1], "",1);
errno = save_errno;
}
void selfpipe_setup(void)
{
static struct sigaction act;
if (pipe(selfpipe) == -1) { abort(); }
fcntl(selfpipe[0],F_SETFL,fcntl(selfpipe[0],F_GETFL)|O_NONBLOCK);
fcntl(selfpipe[1],F_SETFL,fcntl(selfpipe[1],F_GETFL)|O_NONBLOCK);
memset(&act, 0, sizeof(act));
act.sa_handler = selfpipe_sigh;
sigaction(SIGCHLD, &act, NULL);
}
Then, your waitpid-like function looks like this:
int selfpipe_waitpid(void)
{
static char dummy[4096];
fd_set rfds;
struct timeval tv;
int died = 0, st;
tv.tv_sec = 5;
tv.tv_usec = 0;
FD_ZERO(&rfds);
FD_SET(selfpipe[0], &rfds);
if (select(selfpipe[0]+1, &rfds, NULL, NULL, &tv) > 0) {
while (read(selfpipe[0],dummy,sizeof(dummy)) > 0);
while (waitpid(-1, &st, WNOHANG) != -1) died++;
}
return died;
}
You can see in selfpipe_waitpid() how you can control the timeout and even mix with other select()-based IO.
Fork an intermediate child, which forks the real child and a timeout process and waits for all (both) of its children. When one exits, it'll kill the other one and exit.
pid_t intermediate_pid = fork();
if (intermediate_pid == 0) {
pid_t worker_pid = fork();
if (worker_pid == 0) {
do_work();
_exit(0);
}
pid_t timeout_pid = fork();
if (timeout_pid == 0) {
sleep(timeout_time);
_exit(0);
}
pid_t exited_pid = wait(NULL);
if (exited_pid == worker_pid) {
kill(timeout_pid, SIGKILL);
} else {
kill(worker_pid, SIGKILL); // Or something less violent if you prefer
}
wait(NULL); // Collect the other process
_exit(0); // Or some more informative status
}
waitpid(intermediate_pid, 0, 0);
Surprisingly simple :)
You can even leave out the intermediate child if you're sure no other module in the program is spwaning child processes of its own.
This is an interesting question.
I found sigtimedwait can do it.
EDIT 2016/08/29:
Thanks for Mark Edington's suggestion. I'v tested your example on Ubuntu 16.04, it works as expected.
Note: this only works for child processes. It's a pity that seems no equivalent way of Window's WaitForSingleObject(unrelated_process_handle, timeout) in Linux/Unix to get notified of unrelated process's termination within timeout.
OK, Mark Edington's sample code is here:
/* The program creates a child process and waits for it to finish. If a timeout
* elapses the child is killed. Waiting is done using sigtimedwait(). Race
* condition is avoided by blocking the SIGCHLD signal before fork().
*/
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
static pid_t fork_child (void)
{
int p = fork ();
if (p == -1) {
perror ("fork");
exit (1);
}
if (p == 0) {
puts ("child: sleeping...");
sleep (10);
puts ("child: exiting");
exit (0);
}
return p;
}
int main (int argc, char *argv[])
{
sigset_t mask;
sigset_t orig_mask;
struct timespec timeout;
pid_t pid;
sigemptyset (&mask);
sigaddset (&mask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &mask, &orig_mask) < 0) {
perror ("sigprocmask");
return 1;
}
pid = fork_child ();
timeout.tv_sec = 5;
timeout.tv_nsec = 0;
do {
if (sigtimedwait(&mask, NULL, &timeout) < 0) {
if (errno == EINTR) {
/* Interrupted by a signal other than SIGCHLD. */
continue;
}
else if (errno == EAGAIN) {
printf ("Timeout, killing child\n");
kill (pid, SIGKILL);
}
else {
perror ("sigtimedwait");
return 1;
}
}
break;
} while (1);
if (waitpid(pid, NULL, 0) < 0) {
perror ("waitpid");
return 1;
}
return 0;
}
If your program runs only on contemporary Linux kernels (5.3 or later), the preferred way is to use pidfd_open (https://lwn.net/Articles/789023/ https://man7.org/linux/man-pages/man2/pidfd_open.2.html).
This system call returns a file descriptor representing a process, and then you can select, poll or epoll it, the same way you wait on other types of file descriptors.
For example,
int fd = pidfd_open(pid, 0);
struct pollfd pfd = {fd, POLLIN, 0};
poll(&pfd, 1, 1000) == 1;
The function can be interrupted with a signal, so you could set a timer before calling waitpid() and it will exit with an EINTR when the timer signal is raised. Edit: It should be as simple as calling alarm(5) before calling waitpid().
I thought that select will return EINTR when SIGCHLD signaled by on of the child.
I belive this should work:
while(1)
{
int retval = select(0, NULL, NULL, NULL, &tv, &mask);
if (retval == -1 && errno == EINTR) // some signal
{
pid_t pid = (waitpid(-1, &st, WNOHANG) == 0);
if (pid != 0) // some child signaled
}
else if (retval == 0)
{
// timeout
break;
}
else // error
}
Note: you can use pselect to override current sigmask and avoid interrupts from unneeded signals.
Instead of calling waitpid() directly, you could call sigtimedwait() with SIGCHLD (which would be sended to the parent process after child exited) and wait it be delived to the current thread, just as the function name suggested, a timeout parameter is supported.
please check the following code snippet for detail
static bool waitpid_with_timeout(pid_t pid, int timeout_ms, int* status) {
sigset_t child_mask, old_mask;
sigemptyset(&child_mask);
sigaddset(&child_mask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &child_mask, &old_mask) == -1) {
printf("*** sigprocmask failed: %s\n", strerror(errno));
return false;
}
timespec ts;
ts.tv_sec = MSEC_TO_SEC(timeout_ms);
ts.tv_nsec = (timeout_ms % 1000) * 1000000;
int ret = TEMP_FAILURE_RETRY(sigtimedwait(&child_mask, NULL, &ts));
int saved_errno = errno;
// Set the signals back the way they were.
if (sigprocmask(SIG_SETMASK, &old_mask, NULL) == -1) {
printf("*** sigprocmask failed: %s\n", strerror(errno));
if (ret == 0) {
return false;
}
}
if (ret == -1) {
errno = saved_errno;
if (errno == EAGAIN) {
errno = ETIMEDOUT;
} else {
printf("*** sigtimedwait failed: %s\n", strerror(errno));
}
return false;
}
pid_t child_pid = waitpid(pid, status, WNOHANG);
if (child_pid != pid) {
if (child_pid != -1) {
printf("*** Waiting for pid %d, got pid %d instead\n", pid, child_pid);
} else {
printf("*** waitpid failed: %s\n", strerror(errno));
}
return false;
}
return true;
}
Refer: https://android.googlesource.com/platform/frameworks/native/+/master/cmds/dumpstate/DumpstateUtil.cpp#46
If you're going to use signals anyways (as per Steve's suggestion), you can just send the signal manually when you want to exit. This will cause waitpid to return EINTR and the thread can then exit. No need for a periodic alarm/restart.
Due to circumstances I absolutely needed this to run in the main thread and it was not very simple to use the self-pipe trick or eventfd because my epoll loop was running in another thread. So I came up with this by scrounging together other stack overflow handlers. Note that in general it's much safer to do this in other ways but this is simple. If anyone cares to comment about how it's really really bad then I'm all ears.
NOTE: It is absolutely necessary to block signals handling in any thread save for the one you want to run this in. I do this by default as I believe it messy to handle signals in random threads.
static void ctlWaitPidTimeout(pid_t child, useconds_t usec, int *timedOut) {
int rc = -1;
static pthread_mutex_t alarmMutex = PTHREAD_MUTEX_INITIALIZER;
TRACE("ctlWaitPidTimeout: waiting on %lu\n", (unsigned long) child);
/**
* paranoid, in case this was called twice in a row by different
* threads, which could quickly turn very messy.
*/
pthread_mutex_lock(&alarmMutex);
/* set the alarm handler */
struct sigaction alarmSigaction;
struct sigaction oldSigaction;
sigemptyset(&alarmSigaction.sa_mask);
alarmSigaction.sa_flags = 0;
alarmSigaction.sa_handler = ctlAlarmSignalHandler;
sigaction(SIGALRM, &alarmSigaction, &oldSigaction);
/* set alarm, because no alarm is fired when the first argument is 0, 1 is used instead */
ualarm((usec == 0) ? 1 : usec, 0);
/* wait for the child we just killed */
rc = waitpid(child, NULL, 0);
/* if errno == EINTR, the alarm went off, set timedOut to true */
*timedOut = (rc == -1 && errno == EINTR);
/* in case we did not time out, unset the current alarm so it doesn't bother us later */
ualarm(0, 0);
/* restore old signal action */
sigaction(SIGALRM, &oldSigaction, NULL);
pthread_mutex_unlock(&alarmMutex);
TRACE("ctlWaitPidTimeout: timeout wait done, rc = %d, error = '%s'\n", rc, (rc == -1) ? strerror(errno) : "none");
}
static void ctlAlarmSignalHandler(int s) {
TRACE("ctlAlarmSignalHandler: alarm occured, %d\n", s);
}
EDIT: I've since transitioned to using a solution that integrates well with my existing epoll()-based eventloop, using timerfd. I don't really lose any platform-independence since I was using epoll anyway, and I gain extra sleep because I know the unholy combination of multi-threading and UNIX signals won't hurt my program again.
I can use a signal handler for SIGCHLD, and in the signal handler do whatever I was going to do when a child exits, or send a message to a different thread to do some action. But using a signal handler obfuscates the flow of the code a little bit.
In order to avoid race conditions you should avoid doing anything more complex than changing a volatile flag in a signal handler.
I think the best option in your case is to send a signal to the parent. waitpid() will then set errno to EINTR and return. At this point you check for waitpid return value and errno, notice you have been sent a signal and take appropriate action.
If a third party library is acceptable then the libkqueue project emulates kqueue (the *BSD eventing system) and provides basic process monitoring with EVFILT_PROC + NOTE_EXIT.
The main advantages of using kqueue or libkqueue is that it's cross platform, and doesn't have the complexity of signal handling. If your program is utilises async I/O you may also find it a lower friction interface than using something like epoll and the various *fd functions (signalfd, eventfd, pidfd etc...).
#include <stdio.h>
#include <stdint.h>
#include <sys/event.h> /* kqueue header */
#include <sys/types.h> /* for pid_t */
/* Link with -lkqueue */
int waitpid_timeout(pid_t pid, struct timespec *timeout)
{
struct kevent changelist, eventlist;
int kq, ret;
/* Populate a changelist entry (an event we want to be notified of) */
EV_SET(&changelist, pid, EVFILT_PROC, EV_ADD, NOTE_EXIT, 0, NULL);
kq = kqueue();
/* Call kevent with a timeout */
ret = kevent(kq, &changelist, 1, &eventlist, 1, timeout);
/* Kevent returns 0 on timeout, the number of events that occurred, or -1 on error */
switch (ret) {
case -1:
printf("Error %s\n", strerror(errno));
break;
case 0:
printf("Timeout\n");
break;
case 1:
printf("PID %u exited, status %u\n", (unsigned int)eventlist.ident, (unsigned int)eventlist.data);
break;
}
close(kq);
return ret;
}
Behind the scenes on Linux libkqueue uses either pidfd on Linux kernels >= 5.3 or a waiter thread that listens for SIGCHLD and notifies one or more kqueue instances when a process exits. The second approach is not efficient (it scans PIDs that interest has been registered for using waitid), but that doesn't matter unless you're waiting on large numbers of PIDs.
EVFILT_PROC support has been included in kqueue since its inception, and in libkqueue since v2.5.0.