ZeroMQ handling interrupt in multithreaded application - c++

Graceful exit in ZeroMQ in multithreaded environment
Specs : ubuntu 16.04 with c++11,libzmq : 4.2.3
Sample code
static int s_interrupted = 0;
static void s_signal_handler (int signal_value)
{
s_interrupted = 1;
//some code which will tell main thread to exit
}
static void s_catch_signals (void)
{
struct sigaction action;
action.sa_handler = s_signal_handler;
action.sa_flags = 0;
sigemptyset (&action.sa_mask);
sigaction (SIGINT, &action, NULL);
sigaction (SIGTERM, &action, NULL);
}
static void Thread((zsock_t *pipe, void *)
{
zmq::context_t context(1);
zmq::socket_t requester1(context,ZMQ_DEALER);
zmq::socket_t requester2(context,ZMQ_DEALER);
requester1.connect(address1);
requester2.connect(address2);
zmq_pollitem_t items []=
{{requester1,0,ZMQ_POLLIN,0},
{requester2,0,ZMQ_POLLIN,0}};
while(true)
{
zmq::message_t message;
zmq::poll (items, 2, -1);
if (items [0].revents & ZMQ_POLLIN)
{
requester1.recv(&message);
}
if (items [1].revents & ZMQ_POLLIN)
{
requester2.recv(&message);
}
}
}
int main()
{
.
//some code
.
zactor_t *actor = zactor_new (Threaded, nullptr);
s_catch_signals();
.
//continue
.
//wait till thread finishes to exit
return 0;
}
Now when the interrupt occurs it will call the signal handler from the main thread. I somehow need to tell the thread (poller) to exit from the signal handler. Any ideas how to achieve this?

From ZMQ documentation you have 2 "idiomatic" way of dealing with this :
Polling on a pipe, and writing on the pipe in the signal handler.
Catching exception thrown in recv when a signal is sent.
After testing it, seems that zmq::poll does not throw an exception on SIGINT.
Therefore the solution seem to be to use a socket dedicated to closing.
The solution looks like this :
#include <iostream>
#include <thread>
#include <signal.h>
#include <zmq.hpp>
zmq::context_t* ctx;
static void s_signal_handler (int signal_value)
{
std::cout << "Signal received" << std::endl;
zmq::socket_t stop_socket(*ctx, ZMQ_PAIR);
stop_socket.connect("inproc://stop_address");
zmq::message_t msg("0", 1);
stop_socket.send(msg);
std::cout << "end sighandler" << std::endl;
}
static void s_catch_signals (void)
{
struct sigaction action;
action.sa_handler = s_signal_handler;
action.sa_flags = 0;
sigemptyset (&action.sa_mask);
sigaction (SIGINT, &action, NULL);
sigaction (SIGTERM, &action, NULL);
}
void thread(void)
{
std::cout << "Thread Begin" << std::endl;
zmq::context_t context (1);
ctx = &context;
zmq::socket_t requester1(context,ZMQ_DEALER);
zmq::socket_t requester2(context,ZMQ_DEALER);
zmq::socket_t stop_socket(context, ZMQ_PAIR);
requester1.connect("tcp://127.0.0.1:36483");
requester2.connect("tcp://127.0.0.1:36483");
stop_socket.bind("inproc://stop_address");
zmq_pollitem_t items []=
{
{requester1,0,ZMQ_POLLIN,0},
{requester2,0,ZMQ_POLLIN,0},
{stop_socket,0,ZMQ_POLLIN,0}
};
while ( true )
{
// Blocking read will throw on a signal
int rc = 0;
std::cout << "Polling" << std::endl;
rc = zmq::poll (items, 3, -1);
zmq::message_t message;
if(rc > 0)
{
if (items [0].revents & ZMQ_POLLIN)
{
requester1.recv(&message);
}
if (items [1].revents & ZMQ_POLLIN)
{
requester2.recv(&message);
}
if(items [2].revents & ZMQ_POLLIN)
{
std::cout << "message stop received " << std::endl;
break;
}
}
}
requester1.setsockopt(ZMQ_LINGER, 0);
requester2.setsockopt(ZMQ_LINGER, 0);
stop_socket.setsockopt(ZMQ_LINGER, 0);
requester1.close();
requester2.close();
stop_socket.close();
std::cout << "Thread end" << std::endl;
}
int main(void)
{
std::cout << "Begin" << std::endl;
s_catch_signals ();
zmq::context_t context (1);
zmq::socket_t router(context,ZMQ_ROUTER);
router.bind("tcp://127.0.0.1:36483");
std::thread t(&thread);
t.join();
std::cout << "end join" << std::endl;
}
Note that if you do not want to share the context to the signal handler you could use "ipc://..." .

If you wish to preserve the feel of ZMQ's Actor model in handling signals, you could use the signalfd interface on Linux: signalfd manpage. That way you could use zmq poll to wait for the signal to be delivered, instead of having a signal handler.
It has the added advantage that when handling a signal delivered through a file descriptor, you can call any function you like, because you're handling it synchronously, not asynchronously.

Related

asio::io_context run in thread, asio::steady_timer::async_wait doesn't work

I have tried to create a server with asio, when i try to integrate a timer behind the event handler from client.
asio::io_context m_asioContext;
std::thread m_threadContext;
void print()
{
std::cout << "Hello, world!" << std::endl;
SendTimer();
}
void SendTimer()
{
asio::steady_timer timer(m_asioContext, asio::chrono::seconds(2));
timer.async_wait(boost::bind(&server_interface::print, this));
}
bool Start()
{
try
{
// Issue a task to the asio context - This is important
// as it will prime the context with "work", and stop it
// from exiting immediately. Since this is a server, we
// want it primed ready to handle clients trying to
// connect.
WaitForClientConnection();
std::cout << "[SERVER] Started!azazaz\n";
// Launch the asio context in its own thread
m_threadContext = std::thread([this]() { m_asioContext.run(); });
}
catch (std::exception& e)
{
// Something prohibited the server from listening
std::cerr << "[SERVER] Exception: " << e.what() << "\n";
return false;
}
std::cout << "[SERVER] Started!\n";
return true;
}
void Update(size_t nMaxMessages = -1, bool bWait = false)
{
if (bWait) m_qMessagesIn.wait();
// Process as many messages as you can up to the value
// specified
size_t nMessageCount = 0;
while (nMessageCount < nMaxMessages && !m_qMessagesIn.empty())
{
// Grab the front message
auto msg = m_qMessagesIn.pop_front();
// Pass to message handler
OnMessage(msg.remote, msg.msg);
nMessageCount++;
}
Update(nMaxMessages, bWait);
}
Server call
CustomServer server(60000);
server.Start();
asio::io_context io;
server.Update(-1, true);
It'seem that the timer could not run correctly. Just like the infinitive loop. I really newbie with asio. So I wonder how we could keep multi event with only a thread.
Thanks for your answer.

Do I have to block signals on the main thread to handle cancel point on another thread?

When I was working on a TCP server running in a dedicated thread, I noticed strange behavior in signal handling. I have prepared the following MWE (I've used cerr to avoid race condition on debug printing):
#include <signal.h>
#include <unistd.h>
#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
#undef THREAD
class RaiiObject
{
public:
RaiiObject() { cerr << "RaiiObject ctor" << endl; }
~RaiiObject() { cerr << "RaiiObject dtor" << endl; }
};
static void signalHandler(int sig)
{
write(2, "Signal\n", 7);
}
static void blockSigint()
{
sigset_t blockset;
sigemptyset(&blockset);
sigaddset(&blockset, SIGINT);
sigprocmask(SIG_BLOCK, &blockset, NULL);
}
static void setSigintHandler()
{
struct sigaction sa;
sa.sa_handler = signalHandler;
sa.sa_flags = 0;
sigemptyset(&sa.sa_mask);
sigaction(SIGINT, &sa, NULL);
}
void runSelect()
{
sigset_t emptyset;
sigemptyset(&emptyset);
setSigintHandler();
RaiiObject RaiiObject{};
fd_set fdRead;
while (true) {
cerr << "Loop iteration" << endl;
FD_ZERO(&fdRead);
FD_SET(0, &fdRead);
while (true) {
if (pselect(FD_SETSIZE, &fdRead, NULL, NULL, NULL, &emptyset) > 0) {
cerr << "Select" << endl;
} else {
cerr << "Select break" << endl;
return;
}
}
}
}
int main()
{
cerr << "Main start" << endl;
#ifdef THREAD
cerr << "Thread start" << endl;
//blockSigint();
thread{runSelect}.join();
#else
runSelect();
#endif
cerr << "Main exit" << endl;
return EXIT_SUCCESS;
}
When I compile a single-threaded program (#undef THREAD), I can correctly terminate the runSelect() function with Ctrl-C:
Main start
RaiiObject ctor
Loop iteration
^CSignal
Select break
RaiiObject dtor
Main exit
But when I compile a multithreaded (#define THREAD) program, it hangs at the signal handler:
Main start
RaiiObject ctor
Loop iteration
^CSignal
Only when I block the signal on the main thread with blockSigint() the program again work as I want.
I've examined the program with strace -tt -f and I noticed that working versions use pselect6() with ERESTARTNOHAND:
14:46:53.543360 write(2, "Loop iteration", 14Loop iteration) = 14
14:46:53.543482 write(2, "\n", 1
) = 1
14:46:53.543586 pselect6(1024, [0], NULL, NULL, NULL, {[], 8}) = ? ERESTARTNOHAND (To be restarted if no handler)
14:46:55.286989 --- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2707461, si_uid=1000} ---
14:46:55.287120 write(2, "Signal\n", 7Signal
) = 7
14:46:55.287327 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
14:46:55.287569 write(2, "Select break", 12Select break) = 12
14:46:55.287760 write(2, "\n", 1
but broken version uses futex():
[pid 3469011] 14:48:37.211792 write(2, "Loop iteration", 14Loop iteration) = 14
[pid 3469011] 14:48:37.211916 write(2, "\n", 1
) = 1
[pid 3469011] 14:48:37.212031 pselect6(1024, [0], NULL, NULL, NULL, {[], 8} <unfinished ...>
[pid 3469010] 14:48:40.046146 <... futex resumed>) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
[pid 3469010] 14:48:40.046256 --- SIGINT {si_signo=SIGINT, si_code=SI_USER, si_pid=2707461, si_uid=1000} ---
[pid 3469010] 14:48:40.046354 write(2, "Signal\n", 7Signal
) = 7
[pid 3469010] 14:48:40.046588 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
[pid 3469010] 14:48:40.046821 futex(0x7f4e5c16b9d0, FUTEX_WAIT, 3469011, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
Do I have to block signals on the main thread to handle cancel point on another thread?
You need to allow (unmask) signals in only those threads expected to handle them, and block them in others.
The OS will deliver a process-directed signal to any thread that can receive it. Your terminal's SIGINT is sent to each process in the foreground process group, and the OS decides which thread of each will receive it.
If you only have two threads, and one of them has atomically unmasked SIGINT in a pselect while the other has SIGINT blocked, then the OS will deliver the SIGINT to the former. If both (or neither) can handle a SIGINT, the OS will pick one of them.
Caveat: your code may "miss" a SIGINT generated when both threads have INT masked:
time thr1 thr2
---- ---------- ------
0 block(INT) -
1 run thread (awake) <---- SIGINT
3 join() pselect()
4 ... ...
If the signal arrives outside of thr2's pselect, the OS will find that both threads have the signal blocked. In that case, the OS can choose whichever thread it likes for the signal to be held pending, and could choose thr1, which will never unblock. The SIGINT will be missed.
That may be fine for your application, or it may not be.
As you noticed, my problem was that sigaction() has been connect signal handler to both main() thread and runSelect() thread so SIGINT signal could be caught by main().
Now I have prepared a version in which only the main thread handles SIGINT signal and sends SIGUSR1 signal to specific threads with pthread_kill().
#include <signal.h>
#include <unistd.h>
#include <iostream>
#include <thread>
#include <chrono>
using namespace std;
pthread_t nativeHandle;
class RaiiObject
{
public:
RaiiObject() { cerr << "RaiiObject ctor" << endl; }
~RaiiObject() { cerr << "RaiiObject dtor" << endl; }
};
static void sigintHandler(int)
{
write(2, "INT\n", 4);
pthread_kill(nativeHandle, SIGUSR1);
}
static void sigusrHandler(int)
{
write(2, "USR\n", 4);
}
static void blockSigint()
{
sigset_t blockset;
sigemptyset(&blockset);
sigaddset(&blockset, SIGINT);
sigprocmask(SIG_BLOCK, &blockset, NULL);
}
static void setSigintHandler()
{
struct sigaction sa;
sa.sa_handler = sigintHandler;
sa.sa_flags = 0;
sigemptyset(&sa.sa_mask);
sigaction(SIGINT, &sa, NULL);
}
static void setSigusrHandler()
{
struct sigaction sa;
sa.sa_handler = sigusrHandler;
sa.sa_flags = 0;
sigemptyset(&sa.sa_mask);
sigaction(SIGUSR1, &sa, NULL);
}
void runSelect()
{
sigset_t emptyset;
sigemptyset(&emptyset);
blockSigint();
setSigusrHandler();
RaiiObject RaiiObject{};
fd_set fdRead;
while (true) {
cerr << "Loop iteration" << endl;
FD_ZERO(&fdRead);
FD_SET(0, &fdRead);
while (true) {
if (pselect(FD_SETSIZE, &fdRead, NULL, NULL, NULL, &emptyset) > 0) {
cerr << "Select" << endl;
return;
} else {
cerr << "Select break" << endl;
return;
}
}
}
}
int main()
{
cerr << "Main start" << endl;
cerr << "Thread start" << endl;
thread runSelectThread{runSelect};
nativeHandle = runSelectThread.native_handle();
setSigintHandler();
runSelectThread.join();
cerr << "Main exit" << endl;
return EXIT_SUCCESS;
}
Main start
Thread start
RaiiObject ctor
Loop iteration
^CINT
USR
Select break
RaiiObject dtor
Main exit

Force blocking syscall of other thread to return and set errno to EINTR

please view the following example source code:
void tfunc()
{
// Some blocking syscall that sets errno
if (errno == EINTR)
{
std::cout << "cleanup" << std::endl;
return;
}
// Do some other stuff
}
int main(int argc, char *argv[])
{
std::thread t(tfunc);
sleep(10);
return 0;
}
Is it possible, from another thread, to have the syscall, for example accept() return and set errno to EINTR? If yes, how?
I suggest you use:
non-blocking operations
poll() (or select() or epoll())
a pipe
Before you spawn your thread you setup a pipe which will carry an "interrupt message". In your thread tfunc you setup poll such that it waits on both the file descriptor (socket) you want to work on and the read end of the pipe.
If you want to interrupt that you simply write an "interrupt message" to the write end of the pipe; and in the thread check on return of poll whether the pipe has data to read.
Small demo, no error handling, no handling of signals, just to visualize what I mean:
#include <cassert>
#include <iostream>
#include <thread>
#include <poll.h>
#include <unistd.h>
int fd[2];
void the_blocking_thread(void)
{
pollfd pollfds[2];
pollfds[0].fd = fd[0];
pollfds[0].events = POLLIN;
pollfds[1].fd = -99; // add here your socket / fd
pollfds[1].events = POLLIN; // or whatever you need
std::cout << "waiting for \"interrupt message\" or real work on fd" << std::endl;
int ret = poll(pollfds, 2, -1);
assert(ret > 0);
if (pollfds[0].revents != 0) {
std::cout << "cleanup" << std::endl;
return;
}
// Non blocking call on your fd
// Some other stuff
}
int main() {
int ret = pipe(fd);
assert(ret == 0);
std::cout << "Starting thread" << std::endl;
std::thread t(the_blocking_thread);
std::chrono::seconds timespan(1); // or whatever
std::this_thread::sleep_for(timespan);
std::cout << "Sending \"interrupt\" message" << std::endl;
char dummy = 42;
ret = write (fd[1], &dummy, 1);
assert(ret == 1);
t.join();
}
(Live)

Blocking signals causes boost process not to work

In the code below the class Process can run a process using boost process in asynchronous mode and can kill it if it times out. Now in order to shut it down, I block all the signals in all threads and create a specific thread signal_thread to handle signals. On doing this the program stops working. I guess this is probably because the parent process can no longer receive the signal SIGCHLD and know that the child process has finished executing.
#include <iostream>
#include <csignal>
#include <thread>
#include <chrono>
#include <future>
#include <boost/process.hpp>
#include <boost/asio.hpp>
namespace bp = boost::process;
std::atomic<bool> stop(false);
class Process
{
public:
Process(const std::string& cmd, const int timeout);
void run();
private:
void timeoutHandler(const boost::system::error_code& ec);
void kill();
const std::string command;
const int timeout;
bool stopped;
boost::process::group group;
boost::asio::io_context ioc;
boost::asio::deadline_timer deadline_timer;
unsigned returnStatus;
};
Process::Process(const std::string& cmd, const int timeout):
command(cmd),
timeout(timeout),
stopped(false),
ioc(),
deadline_timer(ioc),
returnStatus(0)
{}
void Process::timeoutHandler(const boost::system::error_code& ec)
{
if (stopped || ec == boost::asio::error::operation_aborted)
{
return;
}
std::cout << "Time Up!" << std::endl;
kill();
deadline_timer.expires_at(boost::posix_time::pos_infin);
}
void Process::run()
{
std::future<std::string> dataOut;
std::future<std::string> dataErr;
std::cout << "Running command: "<< command << std::endl;
bp::child c(command, bp::std_in.close(),
bp::std_out > dataOut,
bp::std_err > dataErr, ioc,
group,
bp::on_exit([=](int e, const std::error_code& ec) {
std::cout << "on_exit: " << ec.message() << " -> "<< e << std::endl;
deadline_timer.cancel();
returnStatus = e;
}));
deadline_timer.expires_from_now(boost::posix_time::seconds(timeout));
deadline_timer.async_wait(std::bind(&Process::timeoutHandler, this, std::placeholders::_1));
ioc.run();
c.wait();
std::cout << "returnStatus "<< returnStatus << std::endl;
std::cout << "stdOut "<< dataOut.get() << std::endl;
std::cout << "stdErr "<< dataErr.get() << std::endl;
}
void Process::kill()
{
std::error_code ec;
group.terminate(ec);
if(ec)
{
std::cerr << "Error occurred while trying to kill the process: " << ec.message() << std::endl;
throw std::runtime_error(ec.message());
}
std::cout << "Killed the process and all its descendants" << std::endl;
stopped = true;
}
void myfunction()
{
while(true)
{
Process p("date", 3600);
p.run();
std::this_thread::sleep_for(std::chrono::milliseconds(3000));
if(stop)
break;
}
}
int main() {
sigset_t sigset;
sigfillset(&sigset);
::pthread_sigmask(SIG_BLOCK, &sigset, nullptr);
std::thread signal_thread([]() {
while(true)
{
sigset_t sigset;
sigfillset(&sigset);
int signo = ::sigwaitinfo(&sigset, nullptr);
if(-1 == signo)
std::abort();
std::cout << "Received signal " << signo << '\n';
if(signo != SIGCHLD)
{
break;
}
}
stop = true;
});
myfunction();
signal_thread.join();
}
Please suggest how I can shut down the program using the signal handling thread as well make the program work correctly.
Thinking more about it, I suggest blocking only signals that you intend for that signal thread to handle, such as SIGINT and SIGTERM:
sigset_t sigset;
sigemptyset(&sigset);
sigaddset(&sigset, SIGINT);
sigaddset(&sigset, SIGTERM);
::pthread_sigmask(SIG_BLOCK, &sigset, nullptr);
std::thread signal_thread([sigset]() { // Use the same sigset.
// ...
int signo = ::sigwaitinfo(&sigset, nullptr);
// ...
});

ZeroMq PUB/SUB pattern not working properly

My Requirements:
High throughput, atleast 5000 messages per second
Order of delivery not important
Publisher, as obvious, should not wait for a response and should not care if a Subscriber is listening or not
Background:
I am creating a new thread for every message because if I dont, the messages generation part will out-speed the sending thread and messages get lost, so a thread for each message seems to be the right approach
Problem:
The problem is that somehow the threads that are started to send out the zMQ message are not being terminated (not exiting/finishing). There seems to be a problem in the following line:
s_send(*client, request.str());
because if I remove it then the threads terminate fine, so probably its this line which is causing problems, my first guess was that the thread is waiting for a response, but does a zmq_PUB wait for a response?
Here is my code:
void *SendHello(void *threadid) {
long tid;
tid = (long) threadid;
//cout << "Hello World! Thread ID, " << tid << endl;
std::stringstream request;
//writing the hex as request to be sent to the server
request << tid;
s_send(*client, request.str());
pthread_exit(NULL);
}
int main() {
int sequence = 0;
int NUM_THREADS = 1000;
while (1) {
pthread_t threads[NUM_THREADS];
int rc;
int i;
for (i = 0; i < NUM_THREADS; i++) {
cout << "main() : creating thread, " << i << endl;
rc = pthread_create(&threads[i], NULL, SendHello, (void *) i);
pthread_detach(threads[i]);
sched_yield();
if (rc) {
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
}
//usleep(1000);
sleep(1);
}
pthread_exit(NULL);
//delete client;
return 0;
}
My Question:
Do I need to tweak zMQ sockets so that the PUB doesnt wait for a reply what am I doing wrong?
Edit:
Adding client definition:
static zmq::socket_t * s_client_socket(zmq::context_t & context) {
std::cout << "I: connecting to server." << std::endl;
zmq::socket_t * client = new zmq::socket_t(context, ZMQ_SUB);
client->connect("tcp://localhost:5555");
// Configure socket to not wait at close time
int linger = 0;
client->setsockopt(ZMQ_LINGER, &linger, sizeof (linger));
return client;
}
zmq::context_t context(1);
zmq::socket_t * client = s_client_socket(context);
but does a zmq_PUB wait for a response?
No, this could be the case if your socket wasn't a PUB socket and you hit the high-water mark, but this isn't the case. Do the messages get sent?