boost asio timer hangs the second time I read async - c++

I have a RS485 communication that I talk over with boost::asio::serial_port. I'm testing what happens when there is no connection. The first time the timer cancels the operation. The second time the program just hangs on io.run(). I'm confused, because I create a new fresh timer, reset io and try to have a clean plate before I start to read anything from the wire.
boost::asio::io_context io;
boost::asio::serial_port port;
std::size_t readUntil(std::vector<char>& buffer, char delim, std::chrono::microseconds timeout) {
using boost::system::error_code;
error_code read_result {};
size_t msglen = 0;
io.reset();
boost::asio::system_timer timer(io, timeout);
boost::asio::async_read_until(port, boost::asio::dynamic_buffer(buffer), delim, [&](error_code ec, size_t n) {
timer.cancel();
read_result = ec;
msglen = n;
}
);
timer.async_wait([&](error_code ec) {
if (!ec) {
port.cancel();
throw exception::read_exception(exception::read_exception::Code::TIME_OUT);
}
});
io.run(); // Hangs here, the second time, not the first
if (read_result) {
std::stringstream sstream;
sstream << read_result;
throw exception::read_exception(exception::read_exception::Code::PORT, sstream.str());
}
return msglen;
}
int main() {
//set up port options (baudrate etc)
for(size_t i = 0; i<10; i++) {
std::vector<char> buffer;
readUntil(buffer, '\', std::chrono::microseconds(1000));
}
}

Related

Boost timer immediately expires without going out of scope

I'm working on a RS485 communication class and I'm trying to make a function that reads until a certain char is on the line, but with a time out. The problem is that my system timer immediately returns, doesn't matter which time out I enter. I tried changing the timer to be a member variable of the class, so it doesn't go out of scope, but that wasn't the problem. I tried different implementations of timers (deadline_timer mostly) but that didn't help. If I remove the timer from the code, then the read succeeds, but when I add it, even if I give it a timeout of 10 seconds (which should be waay more than enough), it will respond with an immediate timeout.
I tried making a simple version of the class here, but I guess that the options mostly depend on the type of machine you're talking to:
class RS485CommunicationLayer final {
public:
RS485CommunicationLayer(
const std::string& path,
/* options */
): io(), port(io), timer(port.get_io_service()) {
open(/* options */);
};
std::size_t write(const char* const buffer, const size_t size) {
/*impl*/
}
// THIS FUNCTION --v
void readUntil(std::vector<char>& buffer, char delim,std::chrono::microseconds timeout) {
boost::optional<boost::system::error_code> timer_result;
boost::optional<boost::system::error_code> read_result;
port.get_io_service().reset();
timer.expires_from_now(timeout);
boost::asio::async_read_until(port, asio::dynamic_buffer(buffer), delim, [&read_result] (const boost::system::error_code& error, size_t) { read_result.reset(error); });
timer.async_wait([&timer_result] (const boost::system::error_code& error) { timer_result.reset(error); });
while (port.get_io_service().run_one())
{
if (read_result)
timer.cancel();
else if (timer_result) {
port.cancel();
}
}
if (read_result)
throw boost::system::system_error(*read_result);
};
private:
asio::io_context io;
asio::serial_port port;
boost::asio::system_timer timer;
void open(/*args*/) {
port.open(path);
/*set options*/
}
};
Edit:
I also tried the following implementation after finding out that run_for() exists. But then the buffer stays empty weirdly enough.
void RS485CommunicationLayer::readUntil(std::vector<char>& buffer, char delim, std::chrono::microseconds timeout) {
boost::optional<boost::system::error_code> read_result;
boost::asio::async_read_until(port, asio::dynamic_buffer(buffer), delim, [&read_result] (const boost::system::error_code& error, size_t) { read_result.reset(error); });
port.get_io_service().run_for(timeout);
if (read_result)
throw boost::system::system_error(*read_result);
}
First off, get_io_service() indicates a Very Old(TM) boost version. Also, it just returns io.
Secondly, why so complicated? I don't even really have the energy to see whether there is a subtle problem with the run_one() loop (it looks fine at a glance).
I'd simplify:
size_t readUntil(std::vector<char>& buffer, char delim,
std::chrono::microseconds timeout) {
error_code read_result;
size_t msglen = 0;
io.reset();
asio::system_timer timer(io, timeout);
asio::async_read_until(port, asio::dynamic_buffer(buffer), delim,
[&](error_code ec, size_t n) {
timer.cancel();
read_result = ec;
msglen = n;
});
timer.async_wait([&](error_code ec) { if (!ec) port.cancel(); });
io.run();
if (read_result)
boost::throw_with_location(boost::system::system_error(read_result),
read_result.location());
return msglen;
}
You can just cancel the complementary IO object from the respective completion handlers.
The timer is per-op and local to the readUntil, so it doesn't have to be a member.
Let's also throw in the write side, which is all of:
size_t write(char const* const data, const size_t size) {
return asio::write(port, asio::buffer(data, size));
}
And I can demo it working:
Live On Coliru
#include <boost/asio.hpp>
#include <iomanip>
#include <iostream>
namespace asio = boost::asio;
using boost::system::error_code;
using namespace std::chrono_literals;
class RS485CommunicationLayer final {
public:
RS485CommunicationLayer(std::string const& path) : io(), port(io) { open(path); };
size_t write(char const* const data, const size_t size) {
return asio::write(port, asio::buffer(data, size));
}
size_t readUntil(std::vector<char>& buffer, char delim,
std::chrono::microseconds timeout) {
error_code read_result;
size_t msglen = 0;
io.reset();
asio::system_timer timer(io, timeout);
asio::async_read_until(port, asio::dynamic_buffer(buffer), delim,
[&](error_code ec, size_t n) {
timer.cancel();
read_result = ec;
msglen = n;
});
timer.async_wait([&](error_code ec) { if (!ec) port.cancel(); });
io.run();
if (read_result)
boost::throw_with_location(boost::system::system_error(read_result),
read_result.location());
return msglen;
}
private:
asio::io_context io;
asio::serial_port port;
void open(std::string path) {
port.open(path);
/*set options*/
}
void close();
};
int main(int argc, char** argv) {
RS485CommunicationLayer comm(argc > 1 ? argv[1] : "");
comm.write("Hello world\n", 12);
for (std::vector<char> response_buffer;
auto len = comm.readUntil(response_buffer, '\n', 100ms);) //
{
std::cout << "Received " << response_buffer.size() << " bytes, next "
<< quoted(std::string_view(response_buffer.data(), len - 1))
<< std::endl;
// consume
response_buffer.erase(begin(response_buffer), begin(response_buffer) + len);
}
}
Demo locally with a socat PTS tunnel:
socat -d -d pty,raw,echo=0 pty,raw,echo=0
And throwing dictionaries at the other end:
while true; do cat /etc/dictionaries-common/words ; done | pv > /dev/pts/10

Need to lock a boost asio tcp socket handler's internal state for multi-thread access?

Objects of Data_t are sent via a TCP socket to a server. The server creates a ConnectionHandler object to handle each incoming connection.
In ConnectionHandler, Data_t objects are read one by one using async_read from the socket, and their num_ fields are summed up then saved to a field ConnectionHandler::total_sum_.
Do I need to lock ConnectionHandler::total_sum_ since multiple threads will write to it?
See the code below. Please note that
ConnectinHandler::received_data_ is re-used as a buffer to hold Data_t objects read from the socket. Is it safe to do so?
ConnectinHandler::process_data() processes a Data_t object first then call ConnectinHandler::read_pack() to read from the socket again.
struct Data_t
{
int num_;
//... some other data
}
template<typename ConnectionHandler>
class Server {
using shared_handler_t = std::shared_ptr<ConnectionHandler>;
public:
Server(int thread_count = 10) :
thread_count_(thread_count), acceptor_(io_service_)
void
start_server(uint16_t port) {
auto handler = std::make_shared<ConnectionHandler>(io_service_, configer_, thread_names_);
// set up the acceptor to listen on the tcp port
boost::asio::ip::tcp::endpoint endpoint(boost::asio::ip::tcp::v4(), port);
acceptor_.open(endpoint.protocol());
boost::system::error_code ec;
acceptor_.bind(endpoint);
acceptor_.listen();
acceptor_.async_accept(handler->socket(),
[=](boost::system::error_code const &ec) {
handle_new_connection(handler, ec);
});
// start pool of threads to process the asio events
for (int i = 0; i < thread_count_; ++i) {
thread_pool_.emplace_back([=] { io_service_.run(); });
}
// Wait for all threads in the pool to exit.
for (std::size_t i = 0; i < thread_pool_.size(); ++i) {
thread_pool_[i].join();
}
}
private:
void
handle_new_connection(shared_handler_t handler,
boost::system::error_code const &error) {
if (error) {
return;
}
handler->start();
auto new_handler = std::make_shared<ConnectionHandler>(io_service_);
acceptor_.async_accept(new_handler->socket(),
[=](boost::system::error_code const &ec) {
handle_new_connection(new_handler, ec);
});
}
int thread_count_;
std::vector<std::thread> thread_pool_;
boost::asio::io_service io_service_;
boost::asio::ip::tcp::acceptor acceptor_;
};
class ConnectionHandler : public std::enable_shared_from_this<ConnectionHandler> {
public:
ConnectionHandler (boost::asio::io_service &service) :
service_ (service), socket_ (service)
{
}
void
start ()
{
read_packet ();
}
private:
void
read_packet ()
{
auto me = shared_from_this ();
boost::asio::async_read (
socket_, boost::asio::buffer (&received_data_, sizeof (Data_t)),
boost::asio::transfer_exactly (sizeof (Data_t)),
[me] (boost::system::error_code const &ec, std::size_t bytes_xfer)
{
me->process_data (ec, bytes_xfer);
});
}
void
process_data (boost::system::error_code const &error,
std::size_t bytes_transferred)
{
if (error)
{
socket_.close ();
return;
}
total_sum_+=received_data_.num_;
read_packet ();
}
boost::asio::io_service &service_;
boost::asio::ip::tcp::socket socket_;
Data_t received_data_;
int total_sum_;
};

How to make real asynchronous client via boost asio

I need to write a dynamic library which should export three functions:
bool init_sender(const char* ip_addr, int port);
void cleanup_sender();
void send_command(const char* cmd, int len);
init_sender should connect to server synchronously and return true / false according to whether it was success or not.
cleanup_sender should wait for all commands to be completed and then returns.
send_command should send the specified command to the server asynchronously and return as fast as possible.
So I wrote the following code:
boost::asio::io_service g_io_service;
std::unique_ptr<boost::asio::io_service::work> g_work;
boost::asio::ip::tcp::socket g_sock(g_io_service);
boost::thread g_io_service_th;
void io_service_processor()
{
g_io_service.run();
}
bool __stdcall init_sender(const char* ip_addr, int port)
{
try
{
g_work = std::make_unique<boost::asio::io_service::work>(g_io_service);
boost::asio::ip::tcp::resolver resolver(g_io_service);
boost::asio::connect(g_sock, resolver.resolve({ ip_addr, std::to_string(port) }));
g_io_service_th = boost::thread(io_service_processor);
return true;
}
catch (const std::exception& ex)
{
return false;
}
}
void __stdcall cleanup_sender()
{
g_work.reset();
if (g_io_service_th.joinable())
{
g_io_service_th.join();
}
}
void async_write_cb(
const boost::system::error_code& error,
std::size_t bytes_transferred)
{
// TODO: implement
}
void __stdcall send_command(const char* cmd, int len)
{
boost::asio::async_write(g_sock, boost::asio::buffer(cmd, len), async_write_cb);
}
As far as I knew from boost asio documentation, all my command posted by async_write function call will be executed from one single thread (the one that contains run function call -- g_io_service_th in my case). Am I right? If so, it doesn't seem to be fully asynchronous to me. What could I do to change this behavior and send several commands at the same time from several threads? Should I create boost::thread_group like this
for (int i = 0; i < pool_size; ++i)
{
_thread_group.create_thread(boost::bind(&boost::asio::io_service::run, &_io_service));
}
or is there any other way?
You're asking a bit question and there's a lot to learn. Probably the most important thing to understand is how to use a work object.
edit: reference to async_write restriction:
http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/reference/async_write/overload1.html
quoting from the documentation:
This operation is implemented in terms of zero or more calls to the stream's async_write_some function, and is known as a composed operation. The program must ensure that the stream performs no other write operations (such as async_write, the stream's async_write_some function, or any other composed operations that perform writes) until this operation completes.
Your asio thread code should look something like this:
#include <iostream>
#include <vector>
#include <boost/asio.hpp>
#include <thread>
struct service_loop
{
using io_service = boost::asio::io_service;
io_service& get_io_service() {
return _io_service;
}
service_loop(size_t threads = 1)
: _strand(_io_service)
, _work(_io_service)
, _socket(_io_service)
{
for(size_t i = 0 ; i < threads ; ++i)
add_thread();
}
~service_loop() {
stop();
}
// adding buffered sequential writes...
void write(const char* data, size_t length)
{
_strand.dispatch([this, v = std::vector<char>(data, data + length)] {
_write_buffer.insert(std::end(_write_buffer), v.begin(), v.end());
check_write();
});
}
private:
std::vector<char> _write_buffer;
bool _writing;
void check_write()
{
if (!_writing and !_write_buffer.empty()) {
auto pv = std::make_shared<std::vector<char>>(std::move(_write_buffer));
_writing = true;
_write_buffer.clear();
boost::asio::async_write(_socket,
boost::asio::buffer(*pv),
[this, pv] (const boost::system::error_code& ec, size_t written) {
_strand.dispatch(std::bind(&service_loop::handle_write,
this,
ec,
written));
});
}
}
void handle_write(const boost::system::error_code& ec, size_t written)
{
_writing = false;
if (ec) {
// handle error somehow
}
else {
check_write();
}
}
private:
io_service _io_service;
io_service::strand _strand;
io_service::work _work;
std::vector<std::thread> _threads;
boost::asio::ip::tcp::socket _socket;
void add_thread()
{
_threads.emplace_back(std::bind(&service_loop::run_thread, this));
}
void stop()
{
_io_service.stop();
for(auto& t : _threads) {
if(t.joinable()) t.join();
}
}
void run_thread()
{
while(!_io_service.stopped())
{
try {
_io_service.run();
}
catch(const std::exception& e) {
// report exceptions here
}
}
}
};
using namespace std;
auto main() -> int
{
service_loop sl;
sl.write("hello", 5);
sl.write(" world", 6);
std::this_thread::sleep_for(std::chrono::seconds(10));
return 0;
}

boost::acceptor::cancel doesn't work as expected

I've implemented a SocketServer using boost. This SocketServer is intended to work as follow:
accept connections for a certain amount of time
stop handling sockets after timeout is reached
Here is my code:
ServerSocket::ServerSocket(unsigned int port)
{
_io_service.reset(new boost::asio::io_service()) ;
_endpoint.reset(new boost::asio::ip::tcp::endpoint(boost::asio::ip::tcp::v4(), port));
_acceptor.reset(new boost::asio::ip::tcp::acceptor(*_io_service, *_endpoint)) ;
_newConnection = false;
}
bool ServerSocket::accept(boost::asio::ip::tcp::socket& socket, int timeout)
{
_newConnection = false;
_acceptor->async_accept(socket, boost::bind(&ServerSocket::handleAccept, this, &socket));
_io_service->reset();
if (timeout > 0)
{
int incrementation = 1;
int time_spent = 0;
while (time_spent < timeout && !_io_service->poll_one())
{
time_spent += incrementation;
sleep(incrementation);
}
}
else
{
_io_service->run_one();
}
if (!_newConnection)
{
_acceptor->cancel();
}
return _newConnection;
}
void ServerSocket::handleAccept(boost::asio::ip::tcp::socket* pSocket)
{
_newConnection = true;
};
My problem is the following: when I call accept() with a timeout and a socket A. If the timeout is reached and I call it again with a new socket B, if then the accept() works, I handle A instead of B.
Tell me if there are information missing.
You can simply use the io_service's task loop, and use a deadline timer to cancel the operation on the acceptor.
Live On Coliru
#include <boost/asio.hpp>
#include <boost/bind.hpp>
using namespace boost;
using asio::ip::tcp;
struct ServerSocket
{
asio::io_service _io_service;
tcp::endpoint _endpoint;
tcp::acceptor _acceptor;
asio::deadline_timer _timer;
bool _newConnection;
ServerSocket(unsigned int port)
: _io_service(),
_endpoint(tcp::v4(), port),
_acceptor(_io_service, _endpoint),
_timer(_io_service),
_newConnection(false)
{
}
void timer_expired(boost::system::error_code ec)
{
if (!ec)
{
_acceptor.cancel();
}
}
bool accept(boost::asio::ip::tcp::socket& socket, int timeout)
{
_newConnection = false;
_io_service.reset();
_timer.expires_from_now(boost::posix_time::seconds(timeout));
_timer.async_wait(boost::bind(&ServerSocket::timer_expired, this, asio::placeholders::error));
_acceptor.async_accept(socket, boost::bind(&ServerSocket::handleAccept, this, &socket, asio::placeholders::error));
_io_service.run();
return _newConnection;
}
void handleAccept(boost::asio::ip::tcp::socket* pSocket, boost::system::error_code ec)
{
if (!ec)
{
_timer.cancel();
_newConnection = true;
}
};
};
int main()
{
ServerSocket s(6767);
tcp::socket socket(s._io_service);
if (s.accept(socket, 3))
std::cout << "Accepted connection from " << socket.remote_endpoint() << "\n";
else
std::cout << "Timeout expired\n";
}
You should probably explicitly check the operation_canceled error code instead if just doing !ec but I'll leave that as an exercise for the reader.

Boost Asio async_read sometimes hangs while reading but not always

I am implementing a small distributed system that consists N machines. Each of them receives some data from some remote server and then propagates the data to other n-1 fellow machines. I am using the Boost Asio async_read and async_write to implement this. I set up a test cluster of N=30 machines. When I tried smaller datesets (receiving 75KB to 750KB per machine), the program always worked. But when I moved on to just a slightly larger dataset (7.5MB), I observed strange behavior: at the beginning, reads and writes happened as expected, but after a while, some machines hanged while others finished, the number of machines that hanged varied with each run. I tried to print out some messages in each handler and found that for those machines that hanged, async_read basically could not successfully read after a while, therefore nothing could proceed afterwards. I checked the remote servers, and they all finished writing. I have tried out using strand to control the order of execution of async reads and writes, and I also tried using different io_services for read and write. None of them solved the problem. I am pretty desperate. Can anyone help me?
Here is the code for the class that does the read and propagation:
const int TRANS_TUPLE_SIZE=15;
const int TRANS_BUFFER_SIZE=5120/TRANS_TUPLE_SIZE*TRANS_TUPLE_SIZE;
class Asio_Trans_Broadcaster
{
private:
char buffer[TRANS_BUFFER_SIZE];
int node_id;
int mpi_size;
int mpi_rank;
boost::asio::ip::tcp::socket* dbsocket;
boost::asio::ip::tcp::socket** sender_sockets;
int n_send;
boost::mutex mutex;
bool done;
public:
Asio_Trans_Broadcaster(boost::asio::ip::tcp::socket* dbskt, boost::asio::ip::tcp::socket** senderskts,
int msize, int mrank, int id)
{
dbsocket=dbskt;
count=0;
node_id=id;
mpi_size=mpi_rank=-1;
sender_sockets=senderskts;
mpi_size=msize;
mpi_rank=mrank;
n_send=-1;
done=false;
}
static std::size_t completion_condition(const boost::system::error_code& error, std::size_t bytes_transferred)
{
int remain=bytes_transferred%TRANS_TUPLE_SIZE;
if(remain==0 && bytes_transferred>0)
return 0;
else
return TRANS_BUFFER_SIZE-bytes_transferred;
}
void write_handler(const boost::system::error_code &ec, std::size_t bytes_transferred)
{
int n=-1;
mutex.lock();
n_send--;
n=n_send;
mutex.unlock();
fprintf(stdout, "~~~~~~ #%d, write_handler: %d bytes, copies_to_send: %d\n",
node_id, bytes_transferred, n);
if(n==0 && !done)
boost::asio::async_read(*dbsocket,
boost::asio::buffer(buffer, TRANS_BUFFER_SIZE),
Asio_Trans_Broadcaster::completion_condition, boost::bind(&Asio_Trans_Broadcaster::broadcast_handler, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
void broadcast_handler(const boost::system::error_code &ec, std::size_t bytes_transferred)
{
fprintf(stdout, "#%d, broadcast_handler: %d bytes, mpi_size:%d, mpi_rank: %d\n", node_id, bytes_transferred, mpi_size, mpi_rank);
if (!ec)
{
int pos=0;
while(pos<bytes_transferred && pos<TRANS_BUFFER_SIZE)
{
int id=-1;
memcpy(&id, &buffer[pos], 4);
if(id<0)
{
done=true;
fprintf(stdout, "#%d, broadcast_handler: done!\n", mpi_rank);
break;
}
pos+=TRANS_TUPLE_SIZE;
}
mutex.lock();
n_send=mpi_size-1;
mutex.unlock();
for(int i=0; i<mpi_size; i++)
if(i!=mpi_rank)
{
boost::asio::async_write(*sender_sockets[i], boost::asio::buffer(buffer, bytes_transferred),
boost::bind(&Asio_Trans_Broadcaster::write_handler, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
}
else
{
cerr<<mpi_rank<<" error: "<<ec.message()<<endl;
delete this;
}
}
void broadcast()
{
boost::asio::async_read(*dbsocket,
boost::asio::buffer(buffer, TRANS_BUFFER_SIZE),
Asio_Trans_Broadcaster::completion_condition, boost::bind(&Asio_Trans_Broadcaster::broadcast_handler, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
}
};
Here is the main code running on each machine:
int N=30;
boost::asio::io_service* sender_io_service=new boost::asio::io_service();
boost::asio::io_service::work* p_work=new boost::asio::io_service::work(*sender_io_service);
boost::thread_group send_thread_pool;
for(int i=0; i<NUM_THREADS; i++)
{
send_thread_pool.create_thread( boost::bind( & boost::asio::io_service::run, sender_io_service ) );
}
boost::asio::io_service* receiver_io_service=new boost::asio::io_service();
shared_ptr<boost::asio::io_service::work> p_work2(new boost::asio::io_service::work(*receiver_io_service));
boost::thread_group thread_pool2;
thread_pool2.create_thread( boost::bind( & boost::asio::io_service::run, receiver_io_service) );
boost::asio::ip::tcp::socket* receiver_socket;
//establish nonblocking connection with remote server
AsioConnectToRemote(5000, 1, receiver_io_service, receiver_socket, true);
boost::asio::ip::tcp::socket* send_sockets[N];
//establish blocking connection with other machines
hadoopNodes = SetupAsioConnectionsWIthOthers(sender_io_service, send_sockets, hostFileName, mpi_rank, mpi_size, 3000, false);
Asio_Trans_Broadcaster* db_receiver=new Asio_Trans_Broadcaster(receiver_socket, send_sockets,
mpi_size, mpi_rank, mpi_rank);
db_receiver->broadcast();
p_work2.reset();
thread_pool2.join_all();
delete p_work;
send_thread_pool.join_all();
I don't know what your code is trying to achieve. There are too many missing bits.
Of course, if the task is to asynchronously send/receive traffic on network sockets, Asio is just the thing for that. It's hard to see what's special about your code.
I'd suggest to clean up the more obvious problems:
there's (almost) no error handling (check your error_code-s!)
unless you're on a funny platform, your format strings should use %lu for size_t
why do you mess around with raw arrays, with possibly bad sizes, when you can just have a vector?
never assume the size of objects if you can use sizeof:
memcpy(&id, &trans_buffer[pos], sizeof(id));
come to think of it, it looks like the indexing of buffer is unsafe anyways:
while(pos < bytes_transferred && pos < TRANS_BUFFER_SIZE)
{
int id = -1;
memcpy(&id, &buffer[pos], sizeof(id));
If e.g. pos == TRANS_BUFFER_SIZE-1 here the memcpy invokes Undefined Behavour...
why is there so much new going on? You're inviting a hairy class of bugs into your code. As if memory management wasn't the achilles heel of lowlevel coding. Use values, or shared pointers. Never delete this. Ever[1]
why is there so much repeated code? Why is one thread pool named after sender and the other thread_pool2? Which contains 1 thread. Eh? Why do you have one work item as a raw pointer, the other as a shared_ptr?
You could just just:
struct service_wrap {
service_wrap(int threads) {
while(threads--)
pool.create_thread(boost::bind(&boost::asio::io_service::run, boost::ref(io_service)));
}
~service_wrap() {
io_service.post(boost::bind(&service_wrap::stop, this));
pool.join_all();
}
private: // mind the initialization order!
boost::asio::io_service io_service;
boost::optional<boost::asio::io_service::work> work;
boost::thread_group pool;
void stop() {
work = boost::none;
}
};
So you can simply write:
service_wrap senders(NUM_THREADS);
service_wrap receivers(1);
Wow. Did you see that? No more chance of error. If you fix one pool, you fix the other automatically. No more delete the first, .reset() the second work item. In short: no more messy code, and less complexity.
Use exception safe locking guards:
int local_n_send = -1; // not clear naming
{
boost::lock_guard<boost::mutex> lk(mutex);
n_send--;
local_n_send = n_send;
}
the body of broadcast is completely repeated in write_handler(). Why not just call it:
if(local_n_send == 0 && !done)
broadcast();
I think there's still a race condition - not a data race on the access to n_send itself, but the decision to re-broadcast might be wrong if n_send reaches zero after the the lock is released. Now, since broadcast() does only an async operation, you can just do it under the lock and get rid of the race condition:
void write_handler(const error_code &ec, size_t bytes_transferred) {
boost::lock_guard<boost::mutex> lk(mutex);
if(!(done || --n_send))
broadcast();
}
Woop woop. That's three lines of code now. Less code is less bugs.
My guess would be that if you diligently scrub the code like this, you will inevitably find your clues. Think of it like you would look for a lost wedding-ring: you wouldn't leave a mess lying around. Instead, you'd go from room to room and tidy it all up. Throw everything "out" first if need be.
Iff you can make this thing self-contained /and/ reproducible, I'll even debug it further for you!
Cheers
Here's a starting point that I made while looking at the code: Compiling on Coliru
#include <boost/asio.hpp>
#include <boost/thread.hpp>
#include <boost/array.hpp>
#include <boost/make_shared.hpp>
#include <boost/ptr_container/ptr_vector.hpp>
#include <iostream>
const/*expr*/ int TRANS_TUPLE_SIZE = 15;
const/*expr*/ int TRANS_BUFFER_SIZE = 5120 / TRANS_TUPLE_SIZE * TRANS_TUPLE_SIZE;
namespace AsioTrans
{
using boost::system::error_code;
using namespace boost::asio;
typedef ip::tcp::socket socket_t;
typedef boost::ptr_vector<socket_t> socket_list;
class Broadcaster
{
private:
boost::array<char, TRANS_BUFFER_SIZE> trans_buffer;
int node_id;
int mpi_rank;
socket_t& dbsocket;
socket_list& sender_sockets;
int n_send;
boost::mutex mutex;
bool done;
public:
Broadcaster(
socket_t& dbskt,
socket_list& senderskts,
int mrank,
int id) :
node_id(id),
mpi_rank(mrank),
dbsocket(dbskt),
sender_sockets(senderskts),
n_send(-1),
done(false)
{
// count=0;
}
static size_t completion_condition(const error_code& error, size_t bytes_transferred)
{
// TODO FIXME handler error_code here
int remain = bytes_transferred % TRANS_TUPLE_SIZE;
if(bytes_transferred && !remain)
{
return 0;
}
else
{
return TRANS_BUFFER_SIZE - bytes_transferred;
}
}
void write_handler(const error_code &ec, size_t bytes_transferred)
{
// TODO handle errors
// TODO check bytes_transferred
boost::lock_guard<boost::mutex> lk(mutex);
if(!(done || --n_send))
broadcast();
}
void broadcast_handler(const error_code &ec, size_t bytes_transferred)
{
fprintf(stdout, "#%d, broadcast_handler: %lu bytes, mpi_size:%lu, mpi_rank: %d\n", node_id, bytes_transferred, sender_sockets.size(), mpi_rank);
if(!ec)
{
for(size_t pos = 0; (pos < bytes_transferred && pos < TRANS_BUFFER_SIZE); pos += TRANS_TUPLE_SIZE)
{
int id = -1;
memcpy(&id, &trans_buffer[pos], sizeof(id));
if(id < 0)
{
done = true;
fprintf(stdout, "#%d, broadcast_handler: done!\n", mpi_rank);
break;
}
}
{
boost::lock_guard<boost::mutex> lk(mutex);
n_send = sender_sockets.size() - 1;
}
for(int i = 0; size_t(i) < sender_sockets.size(); i++)
{
if(i != mpi_rank)
{
async_write(
sender_sockets[i],
buffer(trans_buffer, bytes_transferred),
boost::bind(&Broadcaster::write_handler, this, placeholders::error, placeholders::bytes_transferred));
}
}
}
else
{
std::cerr << mpi_rank << " error: " << ec.message() << std::endl;
delete this;
}
}
void broadcast()
{
async_read(
dbsocket,
buffer(trans_buffer),
Broadcaster::completion_condition,
boost::bind(&Broadcaster::broadcast_handler, this,
placeholders::error,
placeholders::bytes_transferred));
}
};
struct service_wrap {
service_wrap(int threads) {
while(threads--)
_pool.create_thread(boost::bind(&io_service::run, boost::ref(_service)));
}
~service_wrap() {
_service.post(boost::bind(&service_wrap::stop, this));
_pool.join_all();
}
io_service& service() { return _service; }
private: // mind the initialization order!
io_service _service;
boost::optional<io_service::work> _work;
boost::thread_group _pool;
void stop() {
_work = boost::none;
}
};
extern void AsioConnectToRemote(int, int, io_service&, socket_t&, bool);
extern void SetupAsioConnectionsWIthOthers(io_service&, socket_list&, std::string, int, bool);
}
int main()
{
using namespace AsioTrans;
// there's no use in increasing #threads unless there are blocking operations
service_wrap senders(boost::thread::hardware_concurrency());
service_wrap receivers(1);
socket_t receiver_socket(receivers.service());
AsioConnectToRemote(5000, 1, receivers.service(), receiver_socket, true);
socket_list send_sockets(30);
/*hadoopNodes =*/ SetupAsioConnectionsWIthOthers(senders.service(), send_sockets, "hostFileName", 3000, false);
int mpi_rank = send_sockets.size();
AsioTrans::Broadcaster db_receiver(receiver_socket, send_sockets, mpi_rank, mpi_rank);
db_receiver.broadcast();
}
[1] No exceptions. Except when there's an exception to the no-exceptions rule. Exception-ception.