boost::asio::ip::tcp::socket is connected? - c++

I want to verify the connection status before performing read/write operations.
Is there a way to make an isConnect() method?
I saw this, but it seems "ugly".
I have tested is_open() function as well, but it doesn't have the expected behavior.

TCP is meant to be robust in the face of a harsh network; even though TCP provides what looks like a persistent end-to-end connection, it's all just a lie, each packet is really just a unique, unreliable datagram.
The connections are really just virtual conduits created with a little state tracked at each end of the connection (Source and destination ports and addresses, and local socket). The network stack uses this state to know which process to give each incoming packet to and what state to put in the header of each outgoing packet.
Because of the underlying — inherently connectionless and unreliable — nature of the network, the stack will only report a severed connection when the remote end sends a FIN packet to close the connection, or if it doesn't receive an ACK response to a sent packet (after a timeout and a couple retries).
Because of the asynchronous nature of asio, the easiest way to be notified of a graceful disconnection is to have an outstanding async_read which will return error::eof immediately when the connection is closed. But this alone still leaves the possibility of other issues like half-open connections and network issues going undetected.
The most effectively way to work around unexpected connection interruption is to use some sort of keep-alive or ping. This occasional attempt to transfer data over the connection will allow expedient detection of an unintentionally severed connection.
The TCP protocol actually has a built-in keep-alive mechanism which can be configured in asio using asio::tcp::socket::keep_alive. The nice thing about TCP keep-alive is that it's transparent to the user-mode application, and only the peers interested in keep-alive need configure it. The downside is that you need OS level access/knowledge to configure the timeout parameters, they're unfortunately not exposed via a simple socket option and usually have default timeout values that are quite large (7200 seconds on Linux).
Probably the most common method of keep-alive is to implement it at the application layer, where the application has a special noop or ping message and does nothing but respond when tickled. This method gives you the most flexibility in implementing a keep-alive strategy.

TCP promises to watch for dropped packets -- retrying as appropriate -- to give you a reliable connection, for some definition of reliable. Of course TCP can't handle cases where the server crashes, or your Ethernet cable falls out or something similar occurs. Additionally, knowing that your TCP connection is up doesn't necessarily mean that a protocol that will go over the TCP connection is ready (eg., your HTTP webserver or your FTP server may be in some broken state).
If you know the protocol being sent over TCP then there is probably a way in that protocol to tell you if things are in good shape (for HTTP it would be a HEAD request)

If you are sure that the remote socket has not sent anything (e.g. because you haven't sent a request to it yet), then you can set your local socket to a non blocking mode and try to read one or more bytes from it.
Given that the server hasn't sent anything, you'll either get a asio::error::would_block or some other error. If former, your local socket has not yet detected a disconnection. If latter, your socket has been closed.
Here is an example code:
#include <iostream>
#include <boost/asio.hpp>
#include <boost/asio/spawn.hpp>
#include <boost/asio/steady_timer.hpp>
using namespace std;
using namespace boost;
using tcp = asio::ip::tcp;
template<class Duration>
void async_sleep(asio::io_service& ios, Duration d, asio::yield_context yield)
{
auto timer = asio::steady_timer(ios);
timer.expires_from_now(d);
timer.async_wait(yield);
}
int main()
{
asio::io_service ios;
tcp::acceptor acceptor(ios, tcp::endpoint(tcp::v4(), 0));
boost::asio::spawn(ios, [&](boost::asio::yield_context yield) {
tcp::socket s(ios);
acceptor.async_accept(s, yield);
// Keep the socket from going out of scope for 5 seconds.
async_sleep(ios, chrono::seconds(5), yield);
});
boost::asio::spawn(ios, [&](boost::asio::yield_context yield) {
tcp::socket s(ios);
s.async_connect(acceptor.local_endpoint(), yield);
// This is essential to make the `read_some` function not block.
s.non_blocking(true);
while (true) {
system::error_code ec;
char c;
// Unfortunately, this only works when the buffer has non
// zero size (tested on Ubuntu 16.04).
s.read_some(asio::mutable_buffer(&c, 1), ec);
if (ec && ec != asio::error::would_block) break;
cerr << "Socket is still connected" << endl;
async_sleep(ios, chrono::seconds(1), yield);
}
cerr << "Socket is closed" << endl;
});
ios.run();
}
And the output:
Socket is still connected
Socket is still connected
Socket is still connected
Socket is still connected
Socket is still connected
Socket is closed
Tested on:
Ubuntu: 16.04
Kernel: 4.15.0-36-generic
Boost: 1.67
Though, I don't know whether or not this behavior depends on any of those versions.

you can send a dummy byte on a socket and see if it will return an error.

Related

OpenSSL client stuck in endless read

I am using cpp-httplib to retrieve some data from a server using long polling (that is, the client will issue a request to the server, and the server will just keep the connection open until the required data is available or a timeout is reached).
The program is running on my raspberry pi, which sits behind a router that does not have an outgoing static ip address. Every time the ip is reassigned (or, at least, close to that time point), my program breaks, in that the thread currently performing the poll will be forever stuck in httplib::SSLClient::Get, which is caused by a blocking read() syscall. Both server- and client timeouts are unable to do anything, while a connection close should make read immediately return 0, which is what i would have expected in this situation.
Inspecting the program with gdb shows the following:
(gdb) thread 2
(gdb) where
__libc_read (nbytes=5, buf=0x75608edb, fd=3) at ../sysdeps/unix/sysv/linux/read.c:26
__libc_read (fd=3, buf=0x75608edb, nbytes=5) at ../sysdeps/unix/sysv/linux/read.c:24
0x76d1862c in ?? () from /usr/lib/arm-linux-gnueabihf/libcrypto.so.1.1
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I am not doing anything (as far as I know) that could accidentally overwrite return addresses.
For comparison, a 'healthy' stack trace during a SSLCLient::Get can be found here.
The actual code is quite a lot, but here's a short version that shows the same behaviour:
#include <iostream>
#define CPPHTTPLIB_OPENSSL_SUPPORT 1
#include "httplib.h"
void poll(httplib::SSLClient* c, char* path) {
while (true) {
auto response = c->Get(path);
std::cout << response.body << std::endl;
}
}
int main(int argc, char* argv[]) {
if (argc >= 3) {
httplib::SSLClient client(argv[1], 443, 20);
std::thread poll_thread(poll, &client, argv[2]);
poll_thread.join();
} else {
std::cerr << "Usage: ./poll <host> <path>" << std::endl;
return 1;
}
}
I can think of some workarounds that might or might not work, but I'd really like to know why and how this is happening in the first place.
Just expanding on the keep_alive option I mentioned in the comment.
In the scenario you described, it seems possible that the underlying TCP socket connection was terminated in an unclean fashion. I.e., you say the IP address was reassigned.
Ideally when there is a TCP socket termination, you want your code to exit out of any blocked read/poll operation. That is what will happen for normal socket closures, e.g., say the remote process is killed, or the remote process just decides it is time to close. But if the IP address of your host is changed .... I'm not sure there will necessarily be a low level TCP messages that says, to affect, this connection is now closed. So the consequence for your program is that is can still hold a local socket (the local TCP endpoint), and not realise the connection has dropped.
This is where something like keep_alive. The idea is that the kernel will send keep alive packets to keep testing if the connection is established; if these ever fail, then it can close the local socket (and so your blocking read, or blocking select, will return with some sort of end-of-stream error).
Separately to keep_alive, you can also consider application heart-beat messages (e.g., websocket has ping/pong). In addition to ensuring the TCP connection remains established, it confirms whether the remote application is healthy.

Bind UDP socket to specific network interface using Boost.Asio

My PC has several network cards and I'm trying to receive UDP data from several broadcast devices. Each device is isolated on a dedicated network and I'm trying to read UDP data from multiple devices at the same time. I'm using Boost version 1.67. Let's pretend in this post that I want to get data from one only specific device, so I want to bind on a local network interface.
On Windows the following code works, but on my Ubuntu 16.04 64bits machine it does not. Indeed, if I bind on one specific local IP address (192.168.1.1 in this example) I do not get any data. But if I use the ANY "0.0.0.0" address then I get what I want. Except that in that case I don't know where it comes from. It could be received by any network card!
Is it normal behavior ? Or do I need to read the sender_endpoint to know that information on Linux and filter afterwards?
#include <iostream>
#include <boost/array.hpp>
#include <boost/asio.hpp>
using boost::asio::ip::udp;
int main(int argc, char* argv[])
{
try
{
boost::asio::io_context io_context;
// Setup UDP Socket
udp::socket socket(io_context);
socket.open(udp::v4());
// Bind to specific network card and chosen port
socket.bind(udp::endpoint(boost::asio::ip::address::from_string("192.168.1.1"), 2368));
// Prepare to receive data
boost::array<char, 128> recv_buf;
udp::endpoint sender_endpoint;
size_t len = socket.receive_from(boost::asio::buffer(recv_buf), sender_endpoint);
// Write data to std output
std::cout.write(recv_buf.data(), len);
}
catch (std::exception& e)
{
std::cerr << e.what() << std::endl;
}
return 0;
}
A little late but others might come to this as well as I have been attempting this with Boost and trying to figure out how it works. From reviewing this question: Fail to listen to UDP Port with boost::asio I went to this page: https://forums.codeguru.com/showthread.php?504427-boost-asio-receive-on-linux and it turns out on Linux that you need to bind to the "any address" in order to receive broadcast packets. So you would set this up as your receiving endpoint:
udp::endpoint(boost::asio::ip::address_v4::any(), port)
And then yes you would need to filter on the sender information. Seems a bit odd but seems to be the way Linux interfaces handle broadcasts.

How to recover from network interruption using boost::asio

I am writing a server that accepts data from a device and processes it. Everything works fine unless there is an interruption in the network (i.e., if I unplug the Ethernet cable, then reconnect it). I'm using read_until() because the protocol that the device uses terminates the packet with a specific sequence of bytes. When the data stream is interrupted, read_until() blocks, as expected. However when the stream starts up again, it remains blocked. If I look at the data stream with Wireshark, the device continues transmitting and each packet is being ACK'ed by the network stack. But if I look at bytes_readable it is always 0. How can I detect the interruption and how to re-establish a connection to the data stream? Below is a code snippet and thanks in advance for any help you can offer. [Go easy on me, this is my first Stack Overflow question....and yes I did try to search for an answer.]
using boost::asio::ip::tcp;
boost::asio::io_service IOservice;
tcp::acceptor acceptor(IOservice, tcp::endpoint(tcp::v4(), listenPort));
tcp::socket socket(IOservice);
acceptor.accept(socket);
for (;;)
{
len = boost::asio::read_until(socket, sbuf, end);
// Process sbuf
// etc.
}
Remember, the client initiates a connection, so the only thing you need to achieve is to re-create the socket and start accepting again. I will keep the format of your snippet but I hope your real code is properly encapsulated.
using SocketType = boost::asio::ip::tcp::socket;
std::unique_ptr<SocketType> CreateSocketAndAccept(
boost::asio::io_service& io_service,
boost::asio::ip::tcp::acceptor& acceptor) {
auto socket = std::make_unique<boost::asio::ip::tcp::socket>(io_service);
boost::system::error_code ec;
acceptor.accept(*socket.get(), ec);
if (ec) {
//TODO: Add handler.
}
return socket;
}
...
auto socket = CreateSocketAndAccept(IOservice, acceptor);
for (;;) {
boost::system::error_code ec;
auto len = boost::asio::read_until(*socket.get(), sbuf, end, ec);
if (ec) // you could be more picky here of course,
// e.g. check against connection_reset, connection_aborted
socket = CreateSocketAndAccept(IOservice, acceptor);
...
}
Footnote: Should go without saying, socket needs to stay in scope.
Edit: Based on the comments bellow.
The listening socket itself does not know whether a client is silent or whether it got cut off. All operations, especially synchronous, should impose a time limit on completion. Consider setting SO_RCVTIMEO or SO_KEEPALIVE (per socket, or system wide, for more info How to use SO_KEEPALIVE option properly to detect that the client at the other end is down?).
Another option is to go async and implement a full fledged "shared" socket server (BOOST example page is a great start).
Either way, you might run into data consistency issues and be forced to deal with it, e.g. when the client detects an interrupted connection, it would resend the data. (or something more complex using higher level protocols)
If you want to stay synchronous, the way I've seen things handled is to destroy the socket when you detect an interruption. The blocking call should throw an exception that you can catch and then start accepting connections again.
for (;;)
{
try {
len = boost::asio::read_until(socket, sbuf, end);
// Process sbuf
// etc.
}
catch (const boost::system::system_error& e) {
// clean up. Start accepting new connections.
}
}
As Tom mentions in his answer, there is no difference between inactivity and ungraceful disconnection so you need an external mechanism to detect this.
If you're expecting continuous data transfer, maybe a timeout per connection on the server side is enough. A simple ping could also work. After accepting a connection, ping your client every X seconds and declare the connection dead if he doesn't answer.

Socket is open after process, that opened it finished

After closing client socket on sever side and exit application, socket still open for some time.
I can see it via netstat
Every 0.1s: netstat -tuplna | grep 6676
tcp 0 0 127.0.0.1:6676 127.0.0.1:36065 TIME_WAIT -
I use log4cxx logging and telnet appender. log4cxx use apr sockets.
Socket::close() method looks like that:
void Socket::close() {
if (socket != 0) {
apr_status_t status = apr_socket_close(socket);
if (status != APR_SUCCESS) {
throw SocketException(status);
}
socket = 0;
}
}
And it's successfully processed. But after program is finished I can see opened socket via netstat, and if it starts again log4cxx unable to open 6676 port, because it is busy.
I tries to modify log4cxx.
Shutdown socket before close:
void Socket::close() {
if (socket != 0) {
apr_status_t shutdown_status = apr_socket_shutdown(socket, APR_SHUTDOWN_READWRITE);
printf("Socket::close shutdown_status %d\n", shutdown_status);
if (shutdown_status != APR_SUCCESS) {
printf("Socket::close WTF %d\n", shutdown_status != APR_SUCCESS);
throw SocketException(shutdown_status);
}
apr_status_t close_status = apr_socket_close(socket);
printf("Socket::close close_status %d\n", close_status);
if (close_status != APR_SUCCESS) {
printf("Socket::close WTF %d\n", close_status != APR_SUCCESS);
throw SocketException(close_status);
}
socket = 0;
}
}
But it didn't helped, bug still reproduced.
This is not a bug. Time Wait (and Close Wait) is by design for safety purpose. You may however adjust the wait time. In any case, on server's perspective the socket is closed and you are relax by the ulimit counter, it has not much visible impact unless you are doing stress test.
As noted by Calvin this isn't a bug, it's a feature. Time Wait is a socket state that says, this socket isn't in use any more but nevertheless can't be reused quite yet.
Imagine you have a socket open and some client is sending data. The data may be backed up in the network or be in-flight when the server closes its socket.
Now imagine you start the service again or start some new service. The packets on the wire aren't aware that its a new service and the service can't know the packets were destined for a service that's gone. The new service may try to parse the packets and fail because they're in some odd format or the client may get an unrelated error back and keep trying to send, maybe because the sequence numbers don't match and the receiving host will get some odd error. With timed wait the client will get notified that the socket is closed and the server won't potentially get odd data. A win-win. The time it waits should be sofficient for all in-transit data to be flused from the system.
Take a look at this post for some additional info: Socket options SO_REUSEADDR and SO_REUSEPORT, how do they differ? Do they mean the same across all major operating systems?
TIME_WAIT is a socket state to allow all in travel packets that could remain from the connection to arrive or dead before the connection parameters (source address, source port, desintation address, destination port) can be reused again. The kernel simply sets a timer to wait for this time to elapse, before allowing you to reuse that socket again. But you cannot shorten it (even if you can, you had better not to do it), because you have no possibility to know if there are still packets travelling or to accelerate or kill them. The only possibility you have is to wait for a socket bound to that port to timeout and pass from the state TIME_WAIT to the CLOSED state.
If you were allowed to reuse the connection (I think there's an option or something can be done in the linux kernel) and you receive an old connection packet, you can get a connection reset due to the received packet. This can lead to more problems in the new connection. These are solved making you wait for all traffic belonging to the old connection to die or reach destination, before you use that socket again.

zeromq: reset REQ/REP socket state

When you use the simple ZeroMQ REQ/REP pattern you depend on a fixed send()->recv() / recv()->send() sequence.
As this article describes you get into trouble when a participant disconnects in the middle of a request because then you can't just start over with receiving the next request from another connection but the state machine would force you to send a request to the disconnected one.
Has there emerged a more elegant way to solve this since the mentioned article has been written?
Is reconnecting the only way to solve this (apart from not using REQ/REP but use another pattern)
As the accepted answer seem so terribly sad to me, I did some research and have found that everything we need was actually in the documentation.
The .setsockopt() with the correct parameter can help you resetting your socket state-machine without brutally destroy it and rebuild another on top of the previous one dead body.
(yeah I like the image).
ZMQ_REQ_CORRELATE: match replies with requests
The default behaviour of REQ sockets is to rely on the ordering of messages to match requests and responses and that is usually sufficient. When this option is set to 1, the REQ socket will prefix outgoing messages with an extra frame containing a request id. That means the full message is (request id, 0, user frames…). The REQ socket will discard all incoming messages that don't begin with these two frames.
Option value type int
Option value unit 0, 1
Default value 0
Applicable socket types ZMQ_REQ
ZMQ_REQ_RELAXED: relax strict alternation between request and reply
By default, a REQ socket does not allow initiating a new request with zmq_send(3) until the reply to the previous one has been received. When set to 1, sending another message is allowed and has the effect of disconnecting the underlying connection to the peer from which the reply was expected, triggering a reconnection attempt on transports that support it. The request-reply state machine is reset and a new request is sent to the next available peer.
If set to 1, also enable ZMQ_REQ_CORRELATE to ensure correct matching of requests and replies. Otherwise a late reply to an aborted request can be reported as the reply to the superseding request.
Option value type int
Option value unit 0, 1
Default value 0
Applicable socket types ZMQ_REQ
A complete documentation is here
The good news is that, as of ZMQ 3.0 and later (the modern era), you can set a timeout on a socket. As others have noted elsewhere, you must do this after you have created the socket, but before you connect it:
zmq_req_socket.setsockopt( zmq.RCVTIMEO, 500 ) # milliseconds
Then, when you actually try to receive the reply (after you have sent a message to the REP socket), you can catch the error that will be asserted if the timeout is exceeded:
try:
send( message, 0 )
send_failed = False
except zmq.Again:
logging.warning( "Image send failed." )
send_failed = True
However! When this happens, as observed elsewhere, your socket will be in a funny state, because it will still be expecting the response. At this point, I cannot find anything that works reliably other than just restarting the socket. Note that if you disconnect() the socket and then re connect() it, it will still be in this bad state. Thus you need to
def reset_my_socket:
zmq_req_socket.close()
zmq_req_socket = zmq_context.socket( zmq.REQ )
zmq_req_socket.setsockopt( zmq.RCVTIMEO, 500 ) # milliseconds
zmq_req_socket.connect( zmq_endpoint )
You will also notice that because I close()d the socket, the receive timeout option was "lost", so it is important set that on the new socket.
I hope this helps. And I hope that this does not turn out to be the best answer to this question. :)
There is one solution to this and that is adding timeouts to all calls. Since ZeroMQ by itself does not really provide simple timeout functionality I recommend using a subclass of the ZeroMQ socket that adds a timeout parameter to all important calls.
So, instead of calling s.recv() you would call s.recv(timeout=5.0) and if a response does not come back within that 5 second window it will return None and stop blocking. I had made a futile attempt at this when I run into this problem.
I'm actually looking into this at the moment, because I am retro fitting a legacy system.
I am coming across code constantly that "needs" to know about the state of the connection. However the thing is I want to move to the message passing paradigm that the library promotes.
I found the following function : zmq_socket_monitor
What it does is monitor the socket passed to it and generate events that are then passed to an "inproc" endpoint - at that point you can add handling code to actually do something.
There is also an example (actually test code) here : github
I have not got any specific code to give at the moment (maybe at the end of the week) but my intention is to respond to the connect and disconnects such that I can actually perform any resetting of logic required.
Hope this helps, and despite quoting 4.2 docs, I am using 4.0.4 which seems to have the functionality
as well.
Note I notice you talk about python above, but the question is tagged C++ so that's where my answer is coming from...
Update: I'm updating this answer with this excellent resource here: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ Socket programming is complicated so do checkout the references in this post.
None of the answers here seem accurate or useful. The OP is not looking for information on BSD socket programming. He is trying to figure out how to robustly handle accept()ed client-socket failures in ZMQ on the REP socket to prevent the server from hanging or crashing.
As already noted -- this problem is complicated by the fact that ZMQ tries to pretend that the servers listen()ing socket is the same as an accept()ed socket (and there is no where in the documentation that describes how to set basic timeouts on such sockets.)
My answer:
After doing a lot of digging through the code, the only relevant socket options passed along to accept()ed socks seem to be keep alive options from the parent listen()er. So the solution is to set the following options on the listen socket before calling send or recv:
void zmq_setup(zmq::context_t** context, zmq::socket_t** socket, const char* endpoint)
{
// Free old references.
if(*socket != NULL)
{
(**socket).close();
(**socket).~socket_t();
}
if(*context != NULL)
{
// Shutdown all previous server client-sockets.
zmq_ctx_destroy((*context));
(**context).~context_t();
}
*context = new zmq::context_t(1);
*socket = new zmq::socket_t(**context, ZMQ_REP);
// Enable TCP keep alive.
int is_tcp_keep_alive = 1;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE, &is_tcp_keep_alive, sizeof(is_tcp_keep_alive));
// Only send 2 probes to check if client is still alive.
int tcp_probe_no = 2;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE_CNT, &tcp_probe_no, sizeof(tcp_probe_no));
// How long does a con need to be "idle" for in seconds.
int tcp_idle_timeout = 1;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE_IDLE, &tcp_idle_timeout, sizeof(tcp_idle_timeout));
// Time in seconds between individual keep alive probes.
int tcp_probe_interval = 1;
(**socket).setsockopt(ZMQ_TCP_KEEPALIVE_INTVL, &tcp_probe_interval, sizeof(tcp_probe_interval));
// Discard pending messages in buf on close.
int is_linger = 0;
(**socket).setsockopt(ZMQ_LINGER, &is_linger, sizeof(is_linger));
// TCP user timeout on unacknowledged send buffer
int is_user_timeout = 2;
(**socket).setsockopt(ZMQ_TCP_MAXRT, &is_user_timeout, sizeof(is_user_timeout));
// Start internal enclave event server.
printf("Host: Starting enclave event server\n");
(**socket).bind(endpoint);
}
What this does is tell the operating system to aggressively check the client socket for timeouts and reap them for cleanup when a client doesn't return a heart beat in time. The result is that the OS will send a SIGPIPE back to your program and socket errors will bubble up to send / recv - fixing a hung server. You then need to do two more things:
1. Handle SIGPIPE errors so the program doesn't crash
#include <signal.h>
#include <zmq.hpp>
// zmq_setup def here [...]
int main(int argc, char** argv)
{
// Ignore SIGPIPE signals.
signal(SIGPIPE, SIG_IGN);
// ... rest of your code after
// (Could potentially also restart the server
// sock on N SIGPIPEs if you're paranoid.)
// Start server socket.
const char* endpoint = "tcp://127.0.0.1:47357";
zmq::context_t* context;
zmq::socket_t* socket;
zmq_setup(&context, &socket, endpoint);
// Message buffers.
zmq::message_t request;
zmq::message_t reply;
// ... rest of your socket code here
}
2. Check for -1 returned by send or recv and catch ZMQ errors.
// E.g. skip broken accepted sockets (pseudo-code.)
while (1):
{
try
{
if ((*socket).recv(&request)) == -1)
throw -1;
}
catch (...)
{
// Prevent any endless error loops killing CPU.
sleep(1)
// Reset ZMQ state machine.
try
{
zmq::message_t blank_reply = zmq::message_t();
(*socket).send (blank_reply);
}
catch (...)
{
1;
}
continue;
}
Notice the weird code that tries to send a reply on a socket failure? In ZMQ, a REP server "socket" is an endpoint to another program making a REQ socket to that server. The result is if you go do a recv on a REP socket with a hung client, the server sock becomes stuck in a broken receive loop where it will wait forever to receive a valid reply.
To force an update on the state machine, you try send a reply. ZMQ detects that the socket is broken, and removes it from its queue. The server socket becomes "unstuck", and the next recv call returns a new client from the queue.
To enable timeouts on an async client (in Python 3), the code would look something like this:
import asyncio
import zmq
import zmq.asyncio
#asyncio.coroutine
def req(endpoint):
ms = 2000 # In milliseconds.
sock = ctx.socket(zmq.REQ)
sock.setsockopt(zmq.SNDTIMEO, ms)
sock.setsockopt(zmq.RCVTIMEO, ms)
sock.setsockopt(zmq.LINGER, ms) # Discard pending buffered socket messages on close().
sock.setsockopt(zmq.CONNECT_TIMEOUT, ms)
# Connect the socket.
# Connections don't strictly happen here.
# ZMQ waits until the socket is used (which is confusing, I know.)
sock.connect(endpoint)
# Send some bytes.
yield from sock.send(b"some bytes")
# Recv bytes and convert to unicode.
msg = yield from sock.recv()
msg = msg.decode(u"utf-8")
Now you have some failure scenarios when something goes wrong.
By the way -- if anyone's curious -- the default value for TCP idle timeout in Linux seems to be 7200 seconds or 2 hours. So you would be waiting a long time for a hung server to do anything!
Sources:
https://github.com/zeromq/libzmq/blob/84dc40dd90fdc59b91cb011a14c1abb79b01b726/src/tcp_listener.cpp#L82 TCP keep alive options preserved for client sock
http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ How does keep alive work
https://github.com/zeromq/libzmq/blob/master/builds/zos/README.md Handling sig pipe errors
https://github.com/zeromq/libzmq/issues/2586 for information on closing sockets
https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
https://github.com/zeromq/libzmq/issues/976
Disclaimer:
I've tested this code and it seems to be working, but ZMQ does complicate testing this a fair bit because the client re-connects on failure? If anyone wants to use this solution in production, I recommend writing some basic unit tests, first.
The server code could also be improved a lot with threading or polling to be able to handle multiple clients at once. As it stands, a malicious client can temporarily take up resources from the server (3 second timeout) which isn't ideal.