C++ / Gloox: how to check when connection is down? - c++

I'm trying to write own jabber bot on c++/gloox. Everything goes fine, but when internet connection is down - bot thinks that it's still connected, and when connection is up again - of course bot doesn't respond to any message.
Each time since bot is successfully connected gloox' recv() returns ConnNoError, even if interface is down and cable unplugged.
Tried use blocking and non-blocking gloox' connection and recv() and all was without any result. Periodic checks of availability of xmpp server in different thread is not seems like a good idea, so how to properly check is bot connected right now or no?
If it's not possible to do with gloox only - please point me on some good method, but let it be availible in unix.

I have the same question, and found the reason why recv always retrun ConnNoError. Here is what I found. When the connection is established, the recv calls a funciton named dataAvailable In ConnectionTCPBase.cpp which return
( ( select( m_socket + 1, &fds, 0, 0, timeout == -1 ? 0 : &tv ) > 0 ) && FD_ISSET( m_socket, &fds ) != 0 )
searching google, I found this thread, it said FD_ISSET( m_socket, &fds ) would detect the socket is readble but not is closed ... Return value of FD_ISSET( m_socket, &fds ) is always 0, even the network is down. In such case, the return value of dataAvailable is false, so the code below finally returns ConnNoError in recv.
if( !dataAvailable( timeout ) )
{
m_recvMutex.unlock();
return ConnNoError;
}
I don't know whether it is a bug or what, seems not.
Later I tried another way, write to the socket directly, and this will cause a SIGPIPE if the socket is closed, catch that signal, then use cleanup to disconnect.
I finally figure out a graceful solution to this problem, using heartbeat.
in the gloox thread, call heartBeat(), where m_pClient is an pointer to a instance of gloox::Client
void CXmpp::heartBeat()
{
m_pClient->xmppPing(m_pClient->jid(), this);
if (++heart) > 3) {
m_pClient->disconnect();
}
}
xmppPing will register itself to eventhandler, when ping comes back, it will call handleEvent, and in handleEvent
void CEventHandler::handleEvent(const Event& event)
{
std::string sEvent;
switch (event.eventType())
{
case Event::PingPing:
sEvent = "PingPing";
break;
case Event::PingPong:
sEvent = "PingPong";
//recieve from server, decrease the count of heart
--heart;
break;
case Event::PingError:
sEvent = "PingError";
break;
default:
break;
}
return;
}
connect to the server, turn off the network, 3 seconds later, I got a disconnect!

You have to define the onDisconnect(ConnectionError e) to be able to handle the disconnect event. The address to documentation is http://camaya.net/api/gloox-0.9.9.12/classgloox_1_1ConnectionListener.html#a2

Related

Detect closed TCP connection during write with boost::asio immediately

I have a TCP-server with multiple clients/sessions. Each session has its own thread for receiving data from the client, but there is only one thread ("writeThread") to respond to all clients.
Now there is the problem, if a client closes the connection during the "writeThread" is writing to this socket it takes multiple seconds until the write operation notices that the connection is closed remotely. Sometimes its not notices at all, just when I send a signal for an installed sighandler manually, the application will detect it and break the write operation.
The time is measuared between Logger::trace("start write"); and Logger::trace("remote term, closed socket ");
Despite the fact that this may not the best design, is there a possibility to detect the closed connection immediately, or do I really have to redesign?
bool myWrite(UINT8 *pu8_buffer, UINT32 u32_size)
{
bool b_success = false;
try
{
Logger::trace("start write");
b_success = (u32_size == boost::asio::write(_x_socket, boost::asio::buffer(pu8_buffer, u32_size)))
}
catch (boost::system::system_error &er)
{
if (er.code() == boost::asio::error::eof ||
er.code() == boost::asio::error::connection_reset)
{
boost::system::error_code x_er;
_x_socket.close(x_er);
if (!x_er)
{
Logger::trace("remote term, closed socket ");
}
else
{
Logger::err("remote term, closed socket failed");
}
}
}
catch(std::exception &ex)
{
Logger::err("write exception\n\t",ex.what());
}
catch(...)
{
Logger::err("write unknown exception",(uint32_t)this);
}
return b_success;
}
If you use asychronous write operations, it is possible to multiplex writes on the same thread without one blocking the other. You can even do the same for reads.
Just to close this question I will summarize "Richard Critten"s comments.
The issue was that there was no graceful disconnect from the client connected to my server. If there is a proper disconnect the write function will break immediately. To avoid long term or infinite blocking of the write operation its possible to configure a timeout for how long a write operation can take before reporting an error. This timeout can be set with the SO_SNDTIMEO socket option. http://man7.org/linux/man-pages/man7/socket.7.html

SSL_shutdown returns -1 with SSL_ERROR_WANT_READ infinitely long

I cannot understand how to properly use SSL_Shutdown command in OpenSSL. Similar questions arisen several times in different places, but I couldn't find a solution which matches exactly my situation. I am using package libssl-dev 1.0.1f-1ubuntu2.15 (the latest for now) under Ubuntu in VirtualBox.
I am working with a small legacy C++ wrapper over the OpenSSL library with non-blocking IO for server and client sockets. The wrapper seems to work fine, except in the following test case (I'm not providing the code of the unit test itself, because it contains a lot of code not related to the problem):
Initialize a server socket with self-signed certificate
Connect to that socket. SSL handshake completes successfully, except that I'm ignoring X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT return of SSL_get_verify_result for now.
Successfully send/receive some data through the connection. This step is optional and it doesn't affect the problem which follows. I mention it only to show that the connection is really established and set into correct state.
Trying to shutdown SSL connection (server or client, doesn't matter which one) which leads to infinite wait on select.
All of the calls to SSL_read and SSL_write are synchronized, locking_callback is also set. After step 3 there are no other operations on sockets except shutting down on one of them.
In the code snippet below I omit all of the error processing and debugging code for clarity, none of the OpenSSL/POSIX calls fail (except cases where I left error processing in place). I also provide connect functions, in case this is important:
void OpenSslWrapper::ConnectToHost( ErrorCode& ec )
{
ctx_ = SSL_CTX_new(SSLv23_client_method());
SSL_CTX_load_verify_locations(ctx_, NULL, config_.verify_locations.c_str());
if (config_.use_standard_verify_locations)
{
SSL_CTX_set_default_verify_paths(ctx_);
}
bio_ = BIO_new_ssl_connect(ctx_);
BIO_get_ssl(bio_, &ssl_);
SSL_set_mode(ssl_, SSL_MODE_AUTO_RETRY);
std::string hostname = config_.address + ":" + to_string(config_.port);
BIO_set_conn_hostname(bio_, hostname.c_str());
BIO_set_nbio(bio_, 1);
int res = 0;
while ((res = BIO_do_connect(bio_)) <= 0)
{
BIO_get_fd(bio_, &fd_);
if (!BIO_should_retry(bio_))
{ /* Never happens */}
WaitAfterError(res);
}
res = SSL_get_verify_result(ssl_);
if (res != X509_V_OK && res != X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT)
{ /* Never happens */ }
SSL_set_mode(ssl_, SSL_MODE_ENABLE_PARTIAL_WRITE);
SSL_set_mode(ssl_, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
}
// config_.handle is a file descriptor which was got from
// accept function, stx_ is also set in advance
void OpenSslWrapper::ConnectAsServer( ErrorCode& ec )
{
ssl_ = SSL_new(ctx_);
int flags = fcntl(config_.handle, F_GETFL, 0);
flags |= O_NONBLOCK;
fcntl(config_.handle, F_SETFL, flags);
SSL_set_fd(ssl_, config_.handle);
while (true)
{
int res = SSL_accept(ssl_);
if( res > 0) {break;}
if( !WaitAfterError(res).isSucceded() )
{ /* never happens */ }
}
SSL_set_mode(ssl_, SSL_MODE_ENABLE_PARTIAL_WRITE);
SSL_set_mode(ssl_, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
}
// The trouble is here
void OpenSSLWrapper::Shutdown()
{
// ...
while (true)
{
int ret = SSL_shutdown(ssl_);
if (ret > 0) {break;}
else if (ret == 0) {continue;}
else {WaitAfterError(ret);}
}
// ...
}
ErrorCode OpenSSLWrapper::WaitAfterError(int res)
{
int err = SSL_get_error(ssl_, res);
switch (ret)
{
case SSL_ERROR_WANT_READ:
WaitForFd(fd_, k_WaitRead);
return ErrorCode::Success;
case SSL_ERROR_WANT_WRITE:
case SSL_ERROR_WANT_CONNECT:
case SSL_ERROR_WANT_ACCEPT:
WaitForFd(fd_, k_WaitWrite);
return ErrorCode::Success;
default:
return ErrorCode::Fail;
}
}
WaitForFd is just a simple wrapper over the select, which waits on a given socket infinitely long on a specified FD_SET for read or write.
When the client calls Shutdown, the first call to SSL_Shutdown returns 0. After the second call it returns -1 and SSL_get_error returns SSL_ERROR_WANT_READ, but selecting on the file descriptor for reading never returns. If specify timeout on select, SSL_Shutdown will continue returning -1 and SSL_get_error continue returning SSL_ERROR_WANT_READ. The loop never exits. After the first call to SSL_Shutdown the shutdown status is always SSL_SENT_SHUTDOWN.
It doesn't matter if I close a server or a client: both have the same behavior.
There's also a strange situation when I connect to some external host. The first call to SSL_Shutdown returns 0, the second one -1 with SSL_ERROR_WANT_READ. Selecting on the socket finishes successfully, but when I call to SSL_Shutdown next time, I again got -1 with error SSL_ERROR_SYSCALL and errno=0. As I read in other places, it is not a big deal, although it still seems strange and maybe it is somehow related, so I mention it here.
UPD. I ported that same code for Windows, the behavior didn't change.
P.S. I am sorry for mistakes in my English, I'd be grateful if someone corrects my language.

address reuse error when using fork() + execlp with boost::asio in Linux

I have a program which listens on a TCP port for particular string and launches an application using execlp call. I am doing a fork() to launch a child process before this execlp call. After this launch parent process again starts listening on the same port. I am closing the socket in child process.
I have written a wrapper over boost::asio::tcp_socket where I am setting the addr_reuse option to true before binding the socket.
Now my problem is in Linux I get an Address reuse error after a few launches of the application. In my program it continuously tries to accept connections (or more precisely tries to schedule an accept to boost::asio::io_service) until bind and then accept succeed. So I receive the error in this loop.
Strangely, if I close (or kill) the launched executable this error stops coming, which means bind succeeds. I am sure that in the launched application the same port is not being used anywhere.
I am using asynchronous socket operations. Any idea why I am getting this error?
Here is how I am accepting on the socket: (I also call reset on the boost::asio::tcp_socket(_tcpSocket) shared pointer before starting a new accept.)
boost::asio::ip::tcp::endpoint endPoint(boost::asio::ip::tcp::v4(), port);
_acceptor.reset ( new boost::asio::ip::tcp::acceptor( *_ioService.get() ) );
_acceptor->open( endPoint.protocol() );
_acceptor->set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
boost::system::error_code ec;
_acceptor->bind(endPoint, ec);
if ( ec.value() != boost::system::errc::success )
{
ec.clear();
_acceptor->close(ec);
close();
return false;
}
ec.clear();
_acceptor->listen(boost::asio::socket_base::max_connections, ec);
if ( ec.value() != boost::system::errc::success )
{
return false;
}
_acceptor->async_accept(*_tcpSocket,
boost::bind(&TCPSocket::_handleAsyncAccept,
this,
boost::asio::placeholders::error) );
Here is how I am forking:
pid_t pid = fork();
switch (pid)
{
case 0:
{
/// close all sockets for child process. as it might cause addr reuse error in parent process
_asyncNO->closeAll();
std::string binary = "<binaryName>";
std::string path = "<binaryPath>";
if ( execlp( path.c_str(), binary.c_str(), controllerIP.c_str(), (char *)0 ) == -1 )
{
LOG_ERROR("System call failed !!")
}
}
break;
default:
}
I have removed logging for simplicity.
As #TannerSansbury said in the comments, this is likely because Boost.Asio needs to be notified of fork():
Newer version of the documentation on forking with Boost.Asio.
Relevant section reproduced here:
Boost.Asio supports programs that utilise the fork() system call. Provided the program calls io_service.notify_fork() at the appropriate times, Boost.Asio will recreate any internal file descriptors (such as the "self-pipe trick" descriptor used for waking up a reactor). The notification is usually performed as follows:
io_service_.notify_fork(boost::asio::io_service::fork_prepare);
if (fork() == 0)
{
io_service_.notify_fork(boost::asio::io_service::fork_child);
// ...
}
else
{
io_service_.notify_fork(boost::asio::io_service::fork_parent);
// ...
}
User-defined services can also be made fork-aware by overriding the io_service::service::fork_service() virtual function.
Note that any file descriptors accessible via Boost.Asio's public API (e.g. the descriptors underlying basic_socket<>, posix::stream_descriptor, etc.) are not altered during a fork. It is the program's responsibility to manage these as required.

How can I clean up properly when recv is blocking?

Consider the example code below (I typed it up quickly as an example, if there are errors it doesn't matter - I'm interested in the theory).
bool shutDown = false; //global
int main()
{
CreateThread(NULL, 0, &MessengerLoop, NULL, 0, NULL);
//do other programmy stuff...
}
DWORD WINAPI MessengerLoop( LPVOID lpParam )
{
zmq::context_t context(1);
zmq::socket_t socket (context, ZMQ_SUB);
socket.connect("tcp://localhost:5556");
socket.setsockopt(ZMQ_SUBSCRIBE, "10001 ", 6);
while(!shutDown)
{
zmq_msg_t getMessage;
zmq_msg_init(&getMessage);
zmq_msg_recv (&getMessage, socket, 0); //This line will wait forever for a message
processMessage(getMessage);
}
}
A thread is created to wait for incoming messages and to handle them appropriately. The thread is looping until shutDown is set to true.
In ZeroMQ the Guide specifically states what must be cleaned up, namely the messages, socket and context.
My issue is: Since recv will wait forever for a message, blocking the thread, how can I shut down this thread safely if a message is never received?
The blocking call will exit in a few ways. First, and this depends on your language and binding, an interrupt (Ctrl-C, SIGINT, SIGTERM) will exit the call. You'll get back (again, depending on your binding) an error or a null message (libzmq returns an EINTR error).
Second, if you terminate the context in another thread, the blocking call will also exit (libzmq returns an ETERM error).
Thirdly, you can set timeouts on the socket so it will return in any case after some timeout, if there's no data. We don't often do this but it can be useful in some cases.
Finally, what we do in practice is never do blocking receives but use zmq_poll to find out when sockets have messages waiting, then receive from those sockets. This is how you scale out to handling more sockets.
You can use non-blocking call flag ZMQ_DONTWAIT
while(!shutDown)
{
zmq_msg_t getMessage;
zmq_msg_init(&getMessage);
while(-1 == zmq_msg_recv(&getMessage, socket, ZMQ_DONTWAIT))
{
if (EAGAIN != errno || shutDown)
{
break;
}
Sleep(100);
}
processMessage(getMessage);
}
Whenever zmq context is destroyed, zmq_msg_recv will receive a -1. I use this as the terminating condition in all of my code.
while (!shutdown)
{
..
..
int rc = zmq_msg_recv (&getMessage, socket, 0);
if (rc != -1)
{
processMessage;
}
else
break;
}
Remember to destroy the zmq context at the end of your main() for a proper clean-up.
zmq_ctx_destroy(zctx);
Lets say you have a class say SUB (subscriber) that manages the receive of your ZMQ messages. In the destructor or exit function of your main function/class, call the following:
pub->close();
///
/// Close the publish context
///
void PUB::close()
{
zmq_close (socket);
zmq_ctx_destroy (context);
}
This will enable that 'recv' blocking terminates with error message that you can ignore. The application will exit comfortably in the right way. This is the right method. Good luck!

UDP socket problem

I'm writing a multiplayer game (obviously using UDP sockets. note: using winsock 2.2). The server code reads something like this:
while(run)
{
select(0, &readSockets, NULL, NULL, &t)
if(FD_ISSET(serverSocket, &readSockets))
{
printf("%s\n","Data receieved");
//recvfrom over here
}
FD_SET(serverSocket, &readSockets);
}
While this is not receiving data from my client, this is:
recvfrom(serverSocket, buffer, sizeof(buffer), 0, &client, &client_size);
One possible issue here is possibly the select() call. I believe the first parameter needs to be the highest socket number +1.
The FD_SET is at the end of the loop so it looks like your first call to select() may have an empty or uninitialized fd_set. Make sure you use FD_ZERO(&readSockets) and FD_SET(serverSocket, &readSockets) before your loop. Also it would be good to check for errors on the select() call.
Hmmm... after fiddling with the code a bit, I found these lines:
console->clear();
console->resetCursorPosition();
So, it was receiving data, but the message on the console was getting erased instantly. [sigh]
You are supposed to check for errors returned by select(). On Windows this would be something like:
if (( nret = select( nfds, &rset, &wset, &eset, &to )) == SOCKET_ERROR )
{
// error handling, probably with WSAGetLastError()
// ...
}
Since it looks like you are using a timeout, select() can also return zero, i.e. no socket descriptors are ready, but timeout expired.