Restarting RPC service - c++

I have a client process that forks a child process to listen for incoming RPCs via the svc_run() method. What I need to do is kill of that child process from the parent and then re-fork the child process providing it a new CLIENT* to a new RPC server.
Here is the bits of my code that are relevant:
// Client Main
CLIENT* connectionToServer;
int pipe[2];
int childPID;
int parentPID;
static void usr2Signal()
{
ServerData sd;
clnt_destroy(connectionToServer);
(void) read(pipe[0], &sd, sizeof(sd));
// Kill child process.
kill(childPID, SIGTERM);
close(pipe[0]);
// RPC connection to the new server
CLIENT *newServerConn =
clnt_create(
sd.ip,
sd.programNum,
1,
"tcp");
if (!newServerConn)
{
// Connection error.
exit(1);
}
connectionToServer = newServerConn;
// Respawn child process.
if (pipe(pipe) == -1)
{
// Pipe error.
exit(2);
}
childPID = fork();
if (childPID == -1)
{
// Fork error.
exit(3);
}
if (childPID == 0)
{
// child closes read pipe and listens for RPCs.
close(pipe[0]);
parentPID = getppid();
svc_run();
}
else
{
// parent closes write pipe and returns to event loop.
close(pipe[1]);
}
}
int main(int argc, char *argv[])
{
/* Some initialization code */
transp = svctcp_create(RPC_ANYSOCK, 0, 0);
if (transp == NULL) {
// TCP connection error.
exit(1);
}
if (!svc_register(transp, /*other RPC program args*/, IPPROTO_TCP))
{
// RPC register error
exit(1);
}
connectionToServer = clnt_create(
192.168.x.xxx, // Server IP.
0x20000123, // Server RPC Program Number
1, // RPC Version
"tcp");
if (!connectionToServer)
{
// Connection error
exit(1);
}
// Spawn child process first time.
if (pipe(pipe) == -1)
{
// Pipe error
exit(1);
}
childPID = fork();
if (childPID == -1)
{
// Fork error.
exit(1);
}
if (childPID == 0)
{
// Close child's read pipe.
close(pipe[0]);
parentPID = getppid();
// Listen for incoming RPCs.
svc_run ();
exit (1);
}
/* Signal/Communication Code */
// Close parent write pipe.
close(pipe[1]);
// Parent runs in event loop infinitely until a signal is sent.
eventLoop();
cleanup();
}
In my server code I have service call that initiates the new connection. This call is invoked by some other operation on the server.
// Server Services
void newserverconnection_1_svc(int *unused, struct svc_req *s)
{
// This service is defined in the server code
ServerData sd;
/* Fill sd with data:
Target IP: 192.168.a.aaa
RPC Program Number: 0x20000321
... other data
*/
connecttonewserver_1(&sd, connectionToServer); // A client service.
}
Back in my client I have the following service:
// Client Service
void connecttonewserver_1_svc(ServerData *sd, struct svc_req *s)
{
// Send the new server connection data to the parent client processs
// via the pipe and signal the parent.
write(pipe[1], sd, sizeof(sd));
kill(parentPID, SIGUSR2);
}
My problem is, everything runs fine until I initiate the new connection. I do not get into any of my error sections, but about 5 seconds after setting up the new connection, my client becomes unresponsive. It does not crash and the child process seems to still be alive also, but my client will no longer receive RPCs or show any print statements when my events defined in the event loop for the parent are triggered by mouse clicks. I am probably doing something slightly wrong to spawn this new RPC loop for the child process, but I can't see what. Any ideas?

So this solution achieves the result I was looking for, but is definitely far from perfect.
static void usr2Signal()
{
ServerData sd;
// clnt_destroy(connectionToServer); // Removed this as it closes the RPC connection.
(void) read(pipe[0], &sd, sizeof(sd));
// Removed these. Killing the child process also seems to close the
// connection. Just let the child run.
// kill(childPID, SIGTERM);
// close(pipe[0]);
// RPC connection to the new server
CLIENT *newServerConn =
clnt_create(
sd.ip,
sd.programNum,
1,
"tcp");
if (!newServerConn)
{
// Connection error.
exit(1);
}
// This is the only necessary line. Note that the old
// connectionToServer pointer was not deregistered/deallocated,
// so this causes a memory leak, but is a quick fix to my issue.
connectionToServer = newServerConn;
// Removed the rest of the code that spawns a new child process
// as it is not needed anymore.
}

Related

How to track a process on Unix C++?

I have a program(A) that starts another program(B).
What I want is when every time B receives signal A sends this signal to B and all child processes of B. I don't really know how to implement a few things here:
1). How do I determine that signal was sent to B?
2). How do I save this signal in variable?
3). How do I loop until B is alive?
int main() {
pid_t pid = fork();
int32_t num = 0;
if (pid == 0) {
static char *argv[] = {"main", NULL};
execv(argv[0], argv); //start program B
}
else{
while(/*B is alive*/){
//if program B receives signal
//I want to send this signal to B and all child processes,
//cause B doesn't handle any signals
if (/*B receives signal*/){
//save this signal to num.
kill(pid, num); //???
//send signal to parent
//useless cause it was already send to B?
fp = popen((("pgrep -P ") + string(num)).c_str(), "r");
//pgrep all child processes
std::vector<int> children;
while (fgets(buf, 128, fp) != NULL) //getting child pid
children.push_back(stoi(string(buf)));
for(auto a : children)
kill(a, num); //send signal to child
}
}
}
return 0;
}
I am afraid your question is really too broad and it involves too many topics. I will try anyway to help if possible.
About Signal handling. I usually spwan a separate thread in my program that is just dedicated to signal handling. In this way, I won't "disturb" the main execution.
About how to handle signals, please have a look to this code snippet:
void * threadSignalHandler (){
int err, signo;
for (;;) {
err = sigwait(&mask, &signo);
if (err != 0) {
syslog(LOG_ERR, "sigwait failed");
exit(1);
}
switch (signo) {
case SIGHUP:
//Do your stuff here
break;
case SIGTERM:
//Do your stuff here
break;
default:
syslog(LOG_INFO, "unexpected signal %d\n", signo);
break;
}
}
return(0);
}
Again, as exaplined, I usually spawn a new basic thread and I do it with in this way:
int err;
pthread_t tid;
/*
* Restore SIGHUP default and block all signals.
*/
sa.sa_handler = SIG_DFL;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGHUP, &sa, NULL) < 0)
err_quit("%s: can′t restore SIGHUP default");
sigfillset(&mask);
if ((err = pthread_sigmask(SIG_BLOCK, &mask, NULL)) != 0)
err_exit(err, "SIG_BLOCK error");
/*
* Create a thread to handle SIGHUP and SIGTERM.
*/
err = pthread_create(&tid, NULL, threadSignalHandler, 0);
if (err != 0)
err_exit(err, "can′t create thread");
So, to answer your 3 questions:
A) Use the code I provided, it is tested and I know it works.
B) Just
modify the thread handler to store the signal received (variable
signo)
C) Please have a look here, there are consolidated ways to do
it, according to posix standards
(Check if process exists given its pid)

how can i make my linux service trigger my signal?

I have made my first linux service with C++.
pid_t pid, sid;
pid = fork();
if (pid < 0) {
exit(EXIT_FAILURE);
}
if (pid>0) {
exit(EXIT_SUCCESS);
}
umask(0);
sid = setsid();
if (sid < 0) {
exit(EXIT_FAILURE);
}
if ((chdir("/")) < 0) {
exit(EXIT_FAILURE);
}
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);
while (1) {
????????
//sleep(10);
}
exit(EXIT_SUCCESS);
What it would do is to wait for my signal and when it receives it to do some tasks and then again wait for my next signal.
I would send my signal (or whatever) somehow from within my c++ app that runs on same machine. Seems like a mechanism of semaphore between two apps. But in this case one is a linux service, and I do not know how the service could wait my signal.
How could I achieve this? What are my alternatives?
Thanks.
Note: The word "signal" caused to confusion. I didn't intend to use that word as technically. I just mean that I need to talk to my linux service from within my cpp app.
NOTE 2: Using signal is not useful because in its handler almost doing any thing is unsafe, whereas I need to do lots of things. (I dont know if I could start a thread, at least!)
Here is an example of an handler that takes care of SIGHUP and SIGTERM, your program could send these signals using kill -9 processid or kill -HUP processid of course there is a few other signals you could use for this purpose check man signal
void handler (int signal_number){
//action
exit(1);
}
And in the main program
struct sigaction act;
struct sigaction act2;
memset (&act, 0, sizeof (act));
memset (&act2, 0, sizeof (act2));
act.sa_handler = handler;
act2.sa_handler = handler;
if (sigaction (SIGHUP, &act, NULL) < 0) {
perror ("sigaction");
}
if (sigaction (SIGTERM, &act, NULL) < 0) {
perror ("sigaction");
}
//wait here for ever or do something.
Finally I have found the right keywords to google what I needed to know.
Here are the alternative ways to communicate between different processes:
http://www.tldp.org/LDP/lpg/node7.html

Why for each write message from client this program make a new process in C/C++?

I am trying to design an echo server which has concurrently feature. It means, Server for each client, it create a parent and child processes. It is for a game server and each client play separately. I have come up with following code but I have no Idea why each time there is a message from client to server it starts to create a new process and start from for(;;){ // Run forever. As I said I think I must have one process for each client. I expect every process to remain in HandleTCPClient until client close its socket
Other issue is where can I initial my datas so each children process share it with itself.
#include "wrappers.h" // socket wrapper fns
#include <sys/wait.h> // for waitpid()
#define RCVBUFSIZE 32 // Size of receive buffer
void HandleTCPClient(int ClntSocket);
extern "C" void SigChldHandler( int Signum );
int i = 0;
int main(int argc, char *argv[])
{
int ServSock; // Socket descriptor for server
int ClntSock; // Socket descriptor for client
unsigned short EchoServPort; // Server port
sockaddr_in EchoServAddr; // Local address
sockaddr_in EchoClntAddr; // Client address
pid_t ProcessID; // Process ID from fork()
unsigned int ChildProcCount = 0; // Number of child processes
EchoServPort = SERV_TCP_PORT;; // First arg: local port
// Create socket for incoming connections
ServSock = Socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
// Construct local address structure
memset((char*)&EchoServAddr, 0, sizeof(EchoServAddr)); /* Zero out structure */
EchoServAddr.sin_family = AF_INET; /* Internet address family */
EchoServAddr.sin_addr.s_addr = htonl(INADDR_ANY); /* Any incoming interface */
EchoServAddr.sin_port = htons(EchoServPort); /* Local port */
// Bind to the local address
Bind(ServSock, (sockaddr*)&EchoServAddr, sizeof(EchoServAddr));
// Mark the socket so it will listen for incoming connections
Listen(ServSock, 5);
signal(SIGCHLD, SigChldHandler); // for preventing zombies
for(;;){ // Run forever
// Set the size of the in-out parameter
socklen_t ClntLen = sizeof(EchoClntAddr);
// Wait for a client to connect
ClntSock = Accept(ServSock, (sockaddr*) &EchoClntAddr,&ClntLen);
//Startin point of new new player to server
// ClntSock is connected to a client!
printf("Handling client %s\n", inet_ntoa(EchoClntAddr.sin_addr));
// Fork child process and report any errors
if ((ProcessID = fork()) < 0){
perror("fork() failed");
exit(1);
}
if (ProcessID == 0){ // If this is the child process
close(ServSock); // Child closes (deallocates) its parent socket descriptor
HandleTCPClient(ClntSock);
exit(1); // Child process terminates
}
printf("With child process: %d\n", (int)ProcessID);
close(ClntSock); // Parent closes (deallocates) its child socket descriptor
ChildProcCount++; // Increment number of outstanding child processes
}
// NOT REACHED
}
void HandleTCPClient(int ClntSocket){
i++;
cout<<"Start of handling"<<endl;
cout<<"i="<<i<<endl;
char EchoBuffer[RCVBUFSIZE]; // Buffer for echo string
int RecvMsgSize; // Size of received message
// Receive message from client
if((RecvMsgSize = recv(ClntSocket, EchoBuffer, RCVBUFSIZE, 0)) < 0){
perror("recv() failed"); exit(1);
cout<<"Error"<<endl;
}
// Send received string and receive again until end of transmission
while(RecvMsgSize > 0){ // zero indicates end of transmission
// Echo message back to client
if(send(ClntSocket, EchoBuffer, RecvMsgSize, 0) != RecvMsgSize){
cout<<"Error"<<endl;
perror("send() failed"); exit(1);
}
// See if there is more data to receive
if((RecvMsgSize = recv(ClntSocket, EchoBuffer, RCVBUFSIZE, 0)) < 0){
cout<<"Error"<<endl;
perror("recv() failed"); exit(1);
}
}
close(ClntSocket); /* Close client socket */
cout<<"End of handling"<<endl;
}
extern "C" void SigChldHandler( int Signum ){
// Catch SIGCHLD signals so child processes don't become zombies.
pid_t pid;
int stat;
while((pid = waitpid(-1, &stat, WNOHANG)) > 0 );
return;
}
Output for three messages form client to server:
Handling client 127.0.0.1
With child process: 40830
Start of handling
i=1
Handling client 127.0.0.1
With child process: 40831
Start of handling
i=1
Handling client 127.0.0.1
With child process: 40832
Start of handling
i=1
Handling client 127.0.0.1
With child process: 40833
Start of handling
i=1
End of handling
End of handling
End of handling
End of handling
As you can see it creates three processes and when I close the program it will close socket for each process!!!
> Edit2 Client side is abstracted:
int main()
{
int Sockfd;
sockaddr_in ServAddr;
char ServHost[] = "localhost";
hostent *HostPtr;
int Port = SERV_TCP_PORT;
//int BuffSize = 0;
//Connection
// get the address of the host
HostPtr = Gethostbyname(ServHost);
if(HostPtr->h_addrtype != AF_INET){
perror("Unknown address type!");
exit(1);
}
memset((char *) &ServAddr, 0, sizeof(ServAddr));
ServAddr.sin_family = AF_INET;
ServAddr.sin_addr.s_addr = ((in_addr*)HostPtr->h_addr_list[0])->s_addr;
ServAddr.sin_port = htons(Port);
//Do some operation
while(!loop){
// open a TCP socket
Sockfd = Socket(AF_INET, SOCK_STREAM, 0);
// connect to the server
Connect(Sockfd, (sockaddr*)&ServAddr, sizeof(ServAddr));
//Prepare message to send server
// write a message to the server
write(Sockfd, data, sizeof(data));
int Len = read(Sockfd, data, 522);
//work on the message from server
}
close(Sockfd);
}
Your client is creating a new socket and connecting it before each write/read, not using the already connected one multiple times. The client should create a socket, connect it to the server and then perform as many write/reads as needed, without creating a new connection.
The server correctly treats each new connection as a new client, and forks to handle it.
Regarding sharing data between forked processes, you could use shared memory as described here.
The client calls socket and connect for every message it writes
while (...) {
socket(...);
connect(...); /* here the server forks a new child process */
write(...);
}
If you want to avoid that, you must move the connection before the loop
socket(...);
connect(...); /* here the server forks a new child process */
while (...) {
write(...);
}

CAsyncSocket::Close Crashes

Hey, im doing some client/server stuff in a windows service. Pretty much new to this stuff.
The problem I'm encountering is that when I try to stop the service through Service Manager, it crashes. I added some MessageBoxes code, to trace where they are crashing and I found that when it closes the listener socket it crashes!!!
I tried to run the service as a console application, and by myself called the function which is called SERVICE__CONTROL__STOP event is received so that I may reproduce the bug and debug easily. But it is working fine. The windows service is only crashing when I stop it through Service Manager
Here is some code
The main function
int main(int argc, char* argv[])
{
// Create the service object
CTestService CustomServiceObject;
if (!AfxWinInit(::GetModuleHandle(NULL), NULL, ::GetCommandLine(), 0))
{
std::cerr << "MFC failed to initialize!" << std::endl;
return 1;
}
// Parse for standard arguments (install, uninstall, version etc.)
if (! CustomServiceObject.ParseStandardArgs(argc, argv))
{
// StartService() calls ::StartServiceCtrlDispatcher()
// with the ServiceMain func and stuff
CustomServiceObject.StartService();
}
// When we get here, the service has been stopped
return CustomServiceObject.m_Status.dwWin32ExitCode;
}
The Service Handler callback function
// static member function (callback) to handle commands from the
// service control manager
void CNTService::Handler(DWORD dwOpcode)
{
// Get a pointer to the object
CNTService* pService = m_pThis;
pService->DebugMsg("CNTService::Handler(%lu)", dwOpcode);
switch (dwOpcode) {
case SERVICE_CONTROL_STOP: // 1
pService->SetStatus(SERVICE_STOP_PENDING);
pService->OnStop();
// ..
// ..
// other event handling
// ..
// ..
}
the OnStop() function
void CTestService::OnStop()
{
m_sListener.ShutDown(2);
m_sConnected.ShutDown(2);
MessageBox(NULL, "After Shutdown", NULL, IDOK);
m_sConnected.Close();
MessageBox(NULL, "Closed connected socket", NULL, IDOK);
// crashes here when I try to stop through service manager
// but if I run as console application works fine and terminates successfully
m_sListener.Close();
MessageBox(NULL, "Closed listener socket", NULL, IDOK);
::PostThreadMessage(m_dwThreadID, WM_QUIT, NULL, NULL);
MessageBox(NULL, "After PostThreadMessage", NULL, IDOK);
}
EDIT: If a connection is made (client connects to the server) and the client closes the connection and then the service is stopped nothing crashes. It only crashes if the socket is listening and no connection is accepted or the client doesnt closes the connection and the service is stopped :)
I guess its clear!
Try adding:-
WSADATA data;
if(!AfxSocketInit(&data))
AfxMessageBox("Failed to Initialize Sockets",MB_OK| MB_ICONSTOP);
to your thread or class initialiser.
The problem is that you're most likely using the socket from multiple threads. Multiple threads and CAsyncSocket do not mix - in fact, as noted by the documentation.
Usually you would push the socket into it's own Worker Thread then you would signal it to close when you needed it to.

Waitpid equivalent with timeout?

Imagine I have a process that starts several child processes. The parent needs to know when a child exits.
I can use waitpid, but then if/when the parent needs to exit I have no way of telling the thread that is blocked in waitpid to exit gracefully and join it. It's nice to have things clean up themselves, but it may not be that big of a deal.
I can use waitpid with WNOHANG, and then sleep for some arbitrary time to prevent a busy wait. However then I can only know if a child has exited every so often. In my case it may not be super critical that I know when a child exits right away, but I'd like to know ASAP...
I can use a signal handler for SIGCHLD, and in the signal handler do whatever I was going to do when a child exits, or send a message to a different thread to do some action. But using a signal handler obfuscates the flow of the code a little bit.
What I'd really like to do is use waitpid on some timeout, say 5 sec. Since exiting the process isn't a time critical operation, I can lazily signal the thread to exit, while still having it blocked in waitpid the rest of the time, always ready to react. Is there such a call in linux? Of the alternatives, which one is best?
EDIT:
Another method based on the replies would be to block SIGCHLD in all threads with pthread \ _sigmask(). Then in one thread, keep calling sigtimedwait() while looking for SIGCHLD. This means that I can time out on that call and check whether the thread should exit, and if not, remain blocked waiting for the signal. Once a SIGCHLD is delivered to this thread, we can react to it immediately, and in line of the wait thread, without using a signal handler.
Don't mix alarm() with wait(). You can lose error information that way.
Use the self-pipe trick. This turns any signal into a select()able event:
int selfpipe[2];
void selfpipe_sigh(int n)
{
int save_errno = errno;
(void)write(selfpipe[1], "",1);
errno = save_errno;
}
void selfpipe_setup(void)
{
static struct sigaction act;
if (pipe(selfpipe) == -1) { abort(); }
fcntl(selfpipe[0],F_SETFL,fcntl(selfpipe[0],F_GETFL)|O_NONBLOCK);
fcntl(selfpipe[1],F_SETFL,fcntl(selfpipe[1],F_GETFL)|O_NONBLOCK);
memset(&act, 0, sizeof(act));
act.sa_handler = selfpipe_sigh;
sigaction(SIGCHLD, &act, NULL);
}
Then, your waitpid-like function looks like this:
int selfpipe_waitpid(void)
{
static char dummy[4096];
fd_set rfds;
struct timeval tv;
int died = 0, st;
tv.tv_sec = 5;
tv.tv_usec = 0;
FD_ZERO(&rfds);
FD_SET(selfpipe[0], &rfds);
if (select(selfpipe[0]+1, &rfds, NULL, NULL, &tv) > 0) {
while (read(selfpipe[0],dummy,sizeof(dummy)) > 0);
while (waitpid(-1, &st, WNOHANG) != -1) died++;
}
return died;
}
You can see in selfpipe_waitpid() how you can control the timeout and even mix with other select()-based IO.
Fork an intermediate child, which forks the real child and a timeout process and waits for all (both) of its children. When one exits, it'll kill the other one and exit.
pid_t intermediate_pid = fork();
if (intermediate_pid == 0) {
pid_t worker_pid = fork();
if (worker_pid == 0) {
do_work();
_exit(0);
}
pid_t timeout_pid = fork();
if (timeout_pid == 0) {
sleep(timeout_time);
_exit(0);
}
pid_t exited_pid = wait(NULL);
if (exited_pid == worker_pid) {
kill(timeout_pid, SIGKILL);
} else {
kill(worker_pid, SIGKILL); // Or something less violent if you prefer
}
wait(NULL); // Collect the other process
_exit(0); // Or some more informative status
}
waitpid(intermediate_pid, 0, 0);
Surprisingly simple :)
You can even leave out the intermediate child if you're sure no other module in the program is spwaning child processes of its own.
This is an interesting question.
I found sigtimedwait can do it.
EDIT 2016/08/29:
Thanks for Mark Edington's suggestion. I'v tested your example on Ubuntu 16.04, it works as expected.
Note: this only works for child processes. It's a pity that seems no equivalent way of Window's WaitForSingleObject(unrelated_process_handle, timeout) in Linux/Unix to get notified of unrelated process's termination within timeout.
OK, Mark Edington's sample code is here:
/* The program creates a child process and waits for it to finish. If a timeout
* elapses the child is killed. Waiting is done using sigtimedwait(). Race
* condition is avoided by blocking the SIGCHLD signal before fork().
*/
#include <sys/types.h>
#include <sys/wait.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
static pid_t fork_child (void)
{
int p = fork ();
if (p == -1) {
perror ("fork");
exit (1);
}
if (p == 0) {
puts ("child: sleeping...");
sleep (10);
puts ("child: exiting");
exit (0);
}
return p;
}
int main (int argc, char *argv[])
{
sigset_t mask;
sigset_t orig_mask;
struct timespec timeout;
pid_t pid;
sigemptyset (&mask);
sigaddset (&mask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &mask, &orig_mask) < 0) {
perror ("sigprocmask");
return 1;
}
pid = fork_child ();
timeout.tv_sec = 5;
timeout.tv_nsec = 0;
do {
if (sigtimedwait(&mask, NULL, &timeout) < 0) {
if (errno == EINTR) {
/* Interrupted by a signal other than SIGCHLD. */
continue;
}
else if (errno == EAGAIN) {
printf ("Timeout, killing child\n");
kill (pid, SIGKILL);
}
else {
perror ("sigtimedwait");
return 1;
}
}
break;
} while (1);
if (waitpid(pid, NULL, 0) < 0) {
perror ("waitpid");
return 1;
}
return 0;
}
If your program runs only on contemporary Linux kernels (5.3 or later), the preferred way is to use pidfd_open (https://lwn.net/Articles/789023/ https://man7.org/linux/man-pages/man2/pidfd_open.2.html).
This system call returns a file descriptor representing a process, and then you can select, poll or epoll it, the same way you wait on other types of file descriptors.
For example,
int fd = pidfd_open(pid, 0);
struct pollfd pfd = {fd, POLLIN, 0};
poll(&pfd, 1, 1000) == 1;
The function can be interrupted with a signal, so you could set a timer before calling waitpid() and it will exit with an EINTR when the timer signal is raised. Edit: It should be as simple as calling alarm(5) before calling waitpid().
I thought that select will return EINTR when SIGCHLD signaled by on of the child.
I belive this should work:
while(1)
{
int retval = select(0, NULL, NULL, NULL, &tv, &mask);
if (retval == -1 && errno == EINTR) // some signal
{
pid_t pid = (waitpid(-1, &st, WNOHANG) == 0);
if (pid != 0) // some child signaled
}
else if (retval == 0)
{
// timeout
break;
}
else // error
}
Note: you can use pselect to override current sigmask and avoid interrupts from unneeded signals.
Instead of calling waitpid() directly, you could call sigtimedwait() with SIGCHLD (which would be sended to the parent process after child exited) and wait it be delived to the current thread, just as the function name suggested, a timeout parameter is supported.
please check the following code snippet for detail
static bool waitpid_with_timeout(pid_t pid, int timeout_ms, int* status) {
sigset_t child_mask, old_mask;
sigemptyset(&child_mask);
sigaddset(&child_mask, SIGCHLD);
if (sigprocmask(SIG_BLOCK, &child_mask, &old_mask) == -1) {
printf("*** sigprocmask failed: %s\n", strerror(errno));
return false;
}
timespec ts;
ts.tv_sec = MSEC_TO_SEC(timeout_ms);
ts.tv_nsec = (timeout_ms % 1000) * 1000000;
int ret = TEMP_FAILURE_RETRY(sigtimedwait(&child_mask, NULL, &ts));
int saved_errno = errno;
// Set the signals back the way they were.
if (sigprocmask(SIG_SETMASK, &old_mask, NULL) == -1) {
printf("*** sigprocmask failed: %s\n", strerror(errno));
if (ret == 0) {
return false;
}
}
if (ret == -1) {
errno = saved_errno;
if (errno == EAGAIN) {
errno = ETIMEDOUT;
} else {
printf("*** sigtimedwait failed: %s\n", strerror(errno));
}
return false;
}
pid_t child_pid = waitpid(pid, status, WNOHANG);
if (child_pid != pid) {
if (child_pid != -1) {
printf("*** Waiting for pid %d, got pid %d instead\n", pid, child_pid);
} else {
printf("*** waitpid failed: %s\n", strerror(errno));
}
return false;
}
return true;
}
Refer: https://android.googlesource.com/platform/frameworks/native/+/master/cmds/dumpstate/DumpstateUtil.cpp#46
If you're going to use signals anyways (as per Steve's suggestion), you can just send the signal manually when you want to exit. This will cause waitpid to return EINTR and the thread can then exit. No need for a periodic alarm/restart.
Due to circumstances I absolutely needed this to run in the main thread and it was not very simple to use the self-pipe trick or eventfd because my epoll loop was running in another thread. So I came up with this by scrounging together other stack overflow handlers. Note that in general it's much safer to do this in other ways but this is simple. If anyone cares to comment about how it's really really bad then I'm all ears.
NOTE: It is absolutely necessary to block signals handling in any thread save for the one you want to run this in. I do this by default as I believe it messy to handle signals in random threads.
static void ctlWaitPidTimeout(pid_t child, useconds_t usec, int *timedOut) {
int rc = -1;
static pthread_mutex_t alarmMutex = PTHREAD_MUTEX_INITIALIZER;
TRACE("ctlWaitPidTimeout: waiting on %lu\n", (unsigned long) child);
/**
* paranoid, in case this was called twice in a row by different
* threads, which could quickly turn very messy.
*/
pthread_mutex_lock(&alarmMutex);
/* set the alarm handler */
struct sigaction alarmSigaction;
struct sigaction oldSigaction;
sigemptyset(&alarmSigaction.sa_mask);
alarmSigaction.sa_flags = 0;
alarmSigaction.sa_handler = ctlAlarmSignalHandler;
sigaction(SIGALRM, &alarmSigaction, &oldSigaction);
/* set alarm, because no alarm is fired when the first argument is 0, 1 is used instead */
ualarm((usec == 0) ? 1 : usec, 0);
/* wait for the child we just killed */
rc = waitpid(child, NULL, 0);
/* if errno == EINTR, the alarm went off, set timedOut to true */
*timedOut = (rc == -1 && errno == EINTR);
/* in case we did not time out, unset the current alarm so it doesn't bother us later */
ualarm(0, 0);
/* restore old signal action */
sigaction(SIGALRM, &oldSigaction, NULL);
pthread_mutex_unlock(&alarmMutex);
TRACE("ctlWaitPidTimeout: timeout wait done, rc = %d, error = '%s'\n", rc, (rc == -1) ? strerror(errno) : "none");
}
static void ctlAlarmSignalHandler(int s) {
TRACE("ctlAlarmSignalHandler: alarm occured, %d\n", s);
}
EDIT: I've since transitioned to using a solution that integrates well with my existing epoll()-based eventloop, using timerfd. I don't really lose any platform-independence since I was using epoll anyway, and I gain extra sleep because I know the unholy combination of multi-threading and UNIX signals won't hurt my program again.
I can use a signal handler for SIGCHLD, and in the signal handler do whatever I was going to do when a child exits, or send a message to a different thread to do some action. But using a signal handler obfuscates the flow of the code a little bit.
In order to avoid race conditions you should avoid doing anything more complex than changing a volatile flag in a signal handler.
I think the best option in your case is to send a signal to the parent. waitpid() will then set errno to EINTR and return. At this point you check for waitpid return value and errno, notice you have been sent a signal and take appropriate action.
If a third party library is acceptable then the libkqueue project emulates kqueue (the *BSD eventing system) and provides basic process monitoring with EVFILT_PROC + NOTE_EXIT.
The main advantages of using kqueue or libkqueue is that it's cross platform, and doesn't have the complexity of signal handling. If your program is utilises async I/O you may also find it a lower friction interface than using something like epoll and the various *fd functions (signalfd, eventfd, pidfd etc...).
#include <stdio.h>
#include <stdint.h>
#include <sys/event.h> /* kqueue header */
#include <sys/types.h> /* for pid_t */
/* Link with -lkqueue */
int waitpid_timeout(pid_t pid, struct timespec *timeout)
{
struct kevent changelist, eventlist;
int kq, ret;
/* Populate a changelist entry (an event we want to be notified of) */
EV_SET(&changelist, pid, EVFILT_PROC, EV_ADD, NOTE_EXIT, 0, NULL);
kq = kqueue();
/* Call kevent with a timeout */
ret = kevent(kq, &changelist, 1, &eventlist, 1, timeout);
/* Kevent returns 0 on timeout, the number of events that occurred, or -1 on error */
switch (ret) {
case -1:
printf("Error %s\n", strerror(errno));
break;
case 0:
printf("Timeout\n");
break;
case 1:
printf("PID %u exited, status %u\n", (unsigned int)eventlist.ident, (unsigned int)eventlist.data);
break;
}
close(kq);
return ret;
}
Behind the scenes on Linux libkqueue uses either pidfd on Linux kernels >= 5.3 or a waiter thread that listens for SIGCHLD and notifies one or more kqueue instances when a process exits. The second approach is not efficient (it scans PIDs that interest has been registered for using waitid), but that doesn't matter unless you're waiting on large numbers of PIDs.
EVFILT_PROC support has been included in kqueue since its inception, and in libkqueue since v2.5.0.