async_receive_from stops receiving after a few packets under Linux - c++

I have a setup with multiple peers broadcasting udp packets (containing images) every 200ms (5fps).
While receiving both the local stream as external streams works fine under Windows, the same code (except for the socket->cancel(); in Windows XP, see comment in code) produces rather strange behavior under Linux:
The first few (5~7) packets sent by another machine (when this machine starts streaming) are received as expected;
After this, the packets from the other machine are received after irregular, long intervals (12s, 5s, 17s, ...) or get a time out (defined after 20 seconds). At certain moments, there is again a burst of (3~4) packets received as expected.
The packets sent by the machine itself are still being received as expected.
Using Wireshark, I see both local as external packets arriving as they should, with correct time intervals between consecutive packages. The behavior also presents itself when the local machine is only listening to a single other stream, with the local stream disabled.
This is some code from the receiver (with some updates as suggested below, thanks!):
Receiver::Receiver(port p)
{
this->port = p;
this->stop = false;
}
int Receiver::run()
{
io_service io_service;
boost::asio::ip::udp::socket socket(
io_service,
boost::asio::ip::udp::endpoint(boost::asio::ip::udp::v4(),
this->port));
while(!stop)
{
const int bufflength = 65000;
int timeout = 20000;
char sockdata[bufflength];
boost::asio::ip::udp::endpoint remote_endpoint;
int rcvd;
bool read_success = this->receive_with_timeout(
sockdata, bufflength, &rcvd, &socket, remote_endpoint, timeout);
if(read_success)
{
std::cout << "read succes " << remote_endpoint.address().to_string() << std::endl;
}
else
{
std::cout << "read fail" << std::endl;
}
}
return 0;
}
void handle_receive_from(
bool* toset, boost::system::error_code error, size_t length, int* outsize)
{
if(!error || error == boost::asio::error::message_size)
{
*toset = length>0?true:false;
*outsize = length;
}
else
{
std::cout << error.message() << std::endl;
}
}
// Update: error check
void handle_timeout( bool* toset, boost::system::error_code error)
{
if(!error)
{
*toset = true;
}
else
{
std::cout << error.message() << std::endl;
}
}
bool Receiver::receive_with_timeout(
char* data, int buffl, int* outsize,
boost::asio::ip::udp::socket *socket,
boost::asio::ip::udp::endpoint &sender_endpoint, int msec_tout)
{
bool timer_overflow = false;
bool read_result = false;
deadline_timer timer( socket->get_io_service() );
timer.expires_from_now( boost::posix_time::milliseconds(msec_tout) );
timer.async_wait( boost::bind(&handle_timeout, &timer_overflow,
boost::asio::placeholders::error) );
socket->async_receive_from(
boost::asio::buffer(data, buffl), sender_endpoint,
boost::bind(&handle_receive_from, &read_result,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred, outsize));
socket->get_io_service().reset();
while ( socket->get_io_service().run_one())
{
if ( read_result )
{
timer.cancel();
}
else if ( timer_overflow )
{
//not to be used on Windows XP, Windows Server 2003, or earlier
socket->cancel();
// Update: added run_one()
socket->get_io_service().run_one();
}
}
// Update: added run_one()
socket->get_io_service().run_one();
return read_result;
}
When the timer exceeds the 20 seconds, the error message "Operation canceled" is returned, but it is difficult to get any other information about what is going on.
Can anyone identify a problem or give me some hints to get some more information about what is going wrong? Any help is appreciated.

Okay, what you're doing is that when you call receive_with_timeout, you're setting up the two asynchronous requests (one for the recv, one for the timeout). When the first one completes, you cancel the other.
However, you never invoke ioservice::run_one() again to allow it's callback to complete. When you cancel an operation in boost::asio, it invokes the handler, usually with an error code indicating that the operation has been aborted or canceled. In this case, I believe you have a handler dangling once you destroy the deadline service, since it has a pointer onto the stack for it to store the result.
The solution is to call run_one() again to process the canceled callback result prior to exiting the function. You should also check the error code being passed to your timeout handler, and only treat it as a timeout if there was no error.
Also, in the case where you do have a timeout, you need to execute run_one so that the async_recv_from handler can execute, and report that it was canceled.

After a clean installation with Xubuntu 12.04 instead of an old install with Ubuntu 10.04, everything now works as expected. Maybe it is because the new install runs a newer kernel, probably with improved networking? Anyway, a re-install with a newer version of the distribution solved my problem.
If anyone else gets unexpected network behavior with an older kernel, I would advice to try it on a system with a newer kernel installed.

Related

asio async operations aren't processed

I am following ASIO's async_tcp_echo_server.cpp example to write a server.
My server logic looks like this (.cpp part):
1.Server startup:
bool Server::Start()
{
mServerThread = std::thread(&Server::ServerThreadFunc, this, std::ref(ios));
//ios is asio::io_service
}
2.Init acceptor and listen for incoming connection:
void Server::ServerThreadFunc(io_service& service)
{
tcp::endpoint endp{ address::from_string(LOCAL_HOST),MY_PORT };
mAcceptor = acceptor_ptr(new tcp::acceptor{ service,endp });
// Add a job to start accepting connections.
StartAccept(*mAcceptor);
// Process event loop.Hang here till service terminated
service.run();
std::cout << "Server thread exiting." << std::endl;
}
3.Accept a connection and start reading from the client:
void Server::StartAccept(tcp::acceptor& acceptor)
{
acceptor.async_accept([&](std::error_code err, tcp::socket socket)
{
if (!err)
{
std::make_shared<Connection>(std::move(socket))->StartRead(mCounter);
StartAccept(acceptor);
}
else
{
std::cerr << "Error:" << "Failed to accept new connection" << err.message() << std::endl;
return;
}
});
}
void Connection::StartRead(uint32_t frameIndex)
{
asio::async_read(mSocket, asio::buffer(&mHeader, sizeof(XHeader)), std::bind(&Connection::ReadHandler, shared_from_this(), std::placeholders::_1, std::placeholders::_2, frameIndex));
}
So the Connection instance finally triggers ReadHandler callback where I perform actual read and write:
void Connection::ReadHandler(const asio::error_code& error, size_t bytes_transfered, uint32_t frameIndex)
{
if (bytes_transfered == sizeof(XHeader))
{
uint32_t reply;
if (mHeader.code == 12345)
{
reply = (uint32_t)12121;
size_t len = asio::write(mSocket, asio::buffer(&reply, sizeof(uint32_t)));
}
else
{
reply = (uint32_t)0;
size_t len = asio::write(mSocket, asio::buffer(&reply, sizeof(uint32_t)));
this->mSocket.shutdown(tcp::socket::shutdown_both);
return;
}
}
while (mSocket.is_open())
{
XPacket packet;
packet.dataSize = rt->buff.size();
packet.data = rt->buff.data();
std::vector<asio::const_buffer> buffers;
buffers.push_back(asio::buffer(&packet.dataSize,sizeof(uint64_t)));
buffers.push_back(asio::buffer(packet.data, packet.dataSize));
auto self(shared_from_this());
asio::async_write(mSocket, buffers,
[this, self](const asio::error_code error, size_t bytes_transfered)
{
if (error)
{
ERROR(200, "Error sending packet");
ERROR(200, error.message().c_str());
}
}
);
}
}
Now, here is the problem. The server receives data from the client and sends ,using sync asio::write, fine. But when it comes to to asio::async_read or asio::async_write inside the while loop, the method's lambda callback never gets triggered, unless I put io_context().run_one(); immediately after that. I don't understand why I see this behaviour. I do call io_service.run() right after acceptor init, so it blocks there till the server exit. The only difference of my code from the asio example, as far as I can tell, is that I run my logic from a custom thread.
Your callback isn't returning, preventing the event loop from executing other handlers.
In general, if you want an asynchronous flow, you would chain callbacks e.g. callback checks is_open(), and if true calls async_write() with itself as the callback.
In either case, the callback returns.
This allows the event loop to run, calling your callback, and so on.
In short, you should make sure your asynchronous callbacks always return in a reasonable time frame.

boost::asio write: Broken pipe

I have a TCP server that handles new connections, when there's a new connection two threads will be created (std::thread, detached).
void Gateway::startServer(boost::asio::io_service& io_service, unsigned short port) {
tcp::acceptor TCPAcceptor(io_service, tcp::endpoint(tcp::v4(), port));
bool UARTToWiFiGatewayStarted = false;
for (;;) { std::cout << "\nstartServer()\n";
auto socket(std::shared_ptr<tcp::socket>(new tcp::socket(io_service)));
/*!
* Accept a new connected WiFi client.
*/
TCPAcceptor.accept(*socket);
socket->set_option( tcp::no_delay( true ) );
// This will set the boolean `Gateway::communicationSessionStatus` variable to true.
Gateway::enableCommunicationSession();
// start one thread
std::thread(WiFiToUARTWorkerSession, socket, this->SpecialUARTPort, this->SpecialUARTPortBaud).detach();
// start the second thread
std::thread(UARTToWifiWorkerSession, socket, this->UARTport, this->UARTbaud).detach();
}
}
The first of two worker functions look like this (here I'm reading using the shared socket):
void Gateway::WiFiToUARTWorkerSession(std::shared_ptr<tcp::socket> socket, std::string SpecialUARTPort, unsigned int baud) {
std::cout << "\nEntered: WiFiToUARTWorkerSession(...)\n";
std::shared_ptr<FastUARTIOHandler> uart(new FastUARTIOHandler(SpecialUARTPort, baud));
try {
while(true == Gateway::communicationSessionStatus) { std::cout << "WiFi->UART\n";
unsigned char WiFiDataBuffer[max_incoming_wifi_data_length];
boost::system::error_code error;
/*!
* Read the TCP data.
*/
size_t length = socket->read_some(boost::asio::buffer(WiFiDataBuffer), error);
/*!
* Handle possible read errors.
*/
if (error == boost::asio::error::eof) {
// this will set the shared boolean variable from "true" to "false", causing the while loop (from the both functions and threads) to stop.
Gateway::disableCommunicationSession();
break; // Connection closed cleanly by peer.
}
else if (error) {
Gateway::disableCommunicationSession();
throw boost::system::system_error(error); // Some other error.
}
uart->write(WiFiDataBuffer, length);
}
}
catch (std::exception &exception) {
std::cerr << "[APP::exception] Exception in thread: " << exception.what() << std::endl;
}
std::cout << "\nExiting: WiFiToUARTWorkerSession(...)\n";
}
And the second one (here I'm writing using the thread-shared socket):
void Gateway::UARTToWifiWorkerSession(std::shared_ptr<tcp::socket> socket, std::string UARTport, unsigned int baud) {
std::cout << "\nEntered: UARTToWifiWorkerSession(...)\n";
/*!
* Buffer used for storing the UART-incoming data.
*/
unsigned char UARTDataBuffer[max_incoming_uart_data_length];
std::vector<unsigned char> outputBuffer;
std::shared_ptr<FastUARTIOHandler> uartHandler(new FastUARTIOHandler(UARTport, baud));
while(true == Gateway::communicationSessionStatus) { std::cout << "UART->WiFi\n";
/*!
* Read the UART-available data.
*/
auto bytesReceived = uartHandler->read(UARTDataBuffer, max_incoming_uart_data_length);
/*!
* If there was some data, send it over TCP.
*/
if(bytesReceived > 0) {
boost::asio::write((*socket), boost::asio::buffer(UARTDataBuffer, bytesReceived));
std::cout << "\nSending data to app...\n";
}
}
std::cout << "\nExited: UARTToWifiWorkerSession(...)\n";
}
For stopping this two threads I do the following thing: from the WiFiToUARTWorkerSession(...) function, if the read(...) fails (there's an error like boost::asio::error::eof, or any other error) I set the Gateway::communicationSessionStatus boolean switch (which is shared (global) by the both functions) to false, this way the functions should return, and the threads should be killed gracefully.
When I'm connecting for the first time, this works well, but when I'm disconnecting from the server, the execution flow from the WiFiToUARTWorkerSession(...) goes through else if (error) condition, it sets the while condition variable to false, and then it throws boost::system::system_error(error) (which actually means Connection reset by peer).
Then when I'm trying to connect again, I got the following exception and the program terminates:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
what(): write: Broken pipe
What could be the problem?
EDIT: From what I found about this error, it seems that I write(...) after the client disconnects, but how could this be possible?
EDIT2: I have debugged the code even more and it seems that one thread (on which runs the UARTToWifiWorkerSession(...) function) won't actually exit (because there's a blocking read(...) function call at where the execution flow stops). This way that one thread will hang until there's some data received by the read(...) function, and when I'm reconnecting there will be created another two threads, this causing some data racing problems.
Can someone confirm me that this could be the problem?
The actual problem was that the function UARTToWifiWorkerSession(...) didn't actually exit (because of a blocking read(...) function, this causing two threads (the hanging one, and one of the latest two created ones) to write(...) (without any concurrency control) using the same socket.
The solution was to set a read(...) timeout, so I can return from the function (and thus destroy the thread) without pending from some input.

Boost.Asio: Why the timer is executed only once?

I have a function called read_packet. This function remains blocked while there is no connection request or the timer is signaled.
The code is the following:
std::size_t read_packet(const std::chrono::milliseconds& timeout,
boost::system::error_code& error)
{
// m_timer_ --> boost::asio::high_resolution_timer
if(!m_is_first_time_) {
m_is_first_time = true;
// Set an expiry time relative to now.
m_timer_.expires_from_now( timeout );
} else {
m_timer_.expires_at( m_timer_.expires_at() + timeout );
}
// Start an asynchronous wait.
m_timer_.async_wait(
[ this ](const boost::system::error_code& error){
if(!error) m_is_timeout_signaled_ = true;
}
);
auto result = m_io_service_.run_one();
if( !m_is_timeout_signaled_ ) {
m_timer_.cancel();
}
m_io_service_.reset();
return result;
}
The function works correctly while not receiving a connection request. All acceptances of requests are asynchronous.
After accepting a connection, the run_one() function does not remains blocked the time set by the timer. The function always returns 1 (one handle has been processed). This handle corresponds to the timer.
I do not understand why this situation occurs.
Why the function is not blocked the time required for the timer?
Cheers.
NOTE: This function is used in a loop.
UPDATE:
I have my own io_service::run() function. This function performs other actions and tasks. I want to listen and process the network level for a period of time:
If something comes on the network level, io_service::run_one() returns and read_packet() returns the control to my run() function.
Otherwise, the timer is fired and read_packet() returns the control to my run() function.
Everything that comes from the network level is stored in a data structure. Then my run() function operates on that data structure.
It also runs other options.
void run(duration timeout, boost::system::error_code& error)
{
time_point start = clock_type::now();
time_point deadline = start + timeout;
while( !stop() ) {
read_packet(timeout, error);
if(error) return;
if(is_timeout_expired( start, deadline, timeout )) return;
// processing network level
// other actions
}
}
In my case, the sockets are always active until a client requests the closing of the connection.
During a time slot, you manage the network level and for another slot you do other things.
After reading the question more closely I got the idea that you are actually trying to use Asio to get synchronous IO, but with a timeout on each read operation.
That's not what Asio was intended for (hence, the name "Asynchronous IO Library").
But sure, you can do it if you insist. Like I said, I feel you're overcomplicating things.
In the completion handler of your timer, just cancel the socket operation if the timer had expired. (Note that if it didn't, you'll get operation_aborted, so check the error code).
Small selfcontained example (which is what you should always do when trying to get help, by the way):
Live On Coliru
#include <boost/asio.hpp>
#include <boost/asio/high_resolution_timer.hpp>
#include <iostream>
struct Program {
Program() { sock_.connect({ boost::asio::ip::address_v4{}, 6771 }); }
std::size_t read_packet(const std::chrono::milliseconds &timeout, boost::system::error_code &error) {
m_io_service_.reset();
boost::asio::high_resolution_timer timer { m_io_service_, timeout };
timer.async_wait([&](boost::system::error_code) {
sock_.cancel();
});
size_t transferred = 0;
boost::asio::async_read(sock_, boost::asio::buffer(buffer_), [&](boost::system::error_code ec, size_t tx) {
error = ec;
transferred = tx;
});
m_io_service_.run();
return transferred;
}
private:
boost::asio::io_service m_io_service_;
using tcp = boost::asio::ip::tcp;
tcp::socket sock_{ m_io_service_ };
std::array<char, 512> buffer_;
};
int main() {
Program client;
boost::system::error_code ec;
while (!ec) {
client.read_packet(std::chrono::milliseconds(100), ec);
}
std::cout << "Exited with '" << ec.message() << "'\n"; // operation canceled in case of timeout
}
If the socket operation succeeds you can see e.g.:
Exited with 'End of file'
Otherwise, if the operation didn't complete within 100 milliseconds, it will print:
Exited with 'Operation canceled'
See also await_operation in this previous answer, which generalizes this pattern a bit more:
boost::asio + std::future - Access violation after closing socket
Ok, The code is incorrect. When the timer is canceled, the timer handler is always executed. For this reason io_service::run_one() function is never blocked.
More information: basic_waitable_timer::cancel
Thanks for the help.

EOF in boost::async_read with thread_pull and boost 1.54

I have a strange problem with my server application. My system is simple: I have 1+ devices and one server app that communicate over a network. Protocol has binary packets with variable length, but fixed header (that contain info about current packet size). Example of packet:
char pct[maxSize] = {}
pct[0] = 0x5a //preambule
pct[1] = 0xa5 //preambule
pct[2] = 0x07 //packet size
pct[3] = 0x0A //command
... [payload]
The protocol is built on the principle of a command-answer.
I use boost::asio for communication - io_service with thread pull (4 threads) + async read/write operation (code example below) and create a "query cycle" - each 200ms by timer:
query one value from device
get result, query second value
get result, start timer again
This work very well on boost 1.53 (Debug and Release). But then i switch to boost 1.54 (especially in Release mode) magic begins. My server successfuly starts, connects to device and starts "query cycle". For about 30-60 seconds everything work well (I receive data, data is correct), but then I start receive asio::error on last read handle (always in one place). Error type: EOF. After recieving the error, I must disconnect from device.
Some time of googling give me info about EOF indicate that other side (device in my case) initiated disconnect procedure. But, according to the logic of the device it can not be true.
May somebody explain what's going on? May be i need set some socket option or defines? I see two possible reason:
my side init disconnect (with some reason, that i don't know) and EOF is answer of this action.
some socket timeout firing.
My environment:
OS: Windows 7/8
Compiler: MSVC 2012 Update 3
Sample code of main "query cycle". Is adapted from official boost chat example All code simplified for reduce space :)
SocketWorker - low level wrapper for sockets
DeviceWorker - class for device communication
ERes - internal struct for error store
ProtoCmd and ProtoAnswer - wrapper for raw array command and answer (chat_message
analog from boost chat example)
lw_service_proto namespace - predefined commands and max sizes of packets
So, code samples. Socket wrapper:
namespace b = boost;
namespace ba = boost::asio;
typedef b::function<void(const ProtoAnswer answ)> DataReceiverType;
class SocketWorker
{
private:
typedef ba::ip::tcp::socket socketType;
typedef std::unique_ptr<socketType> socketPtrType;
socketPtrType devSocket;
ProtoCmd sendCmd;
ProtoAnswer rcvAnsw;
//[other definitions]
public:
//---------------------------------------------------------------------------
ERes SocketWorker::Connect(/*[connect settings]*/)
{
ERes res(LGS_RESULT_ERROR, "Connect to device - Unknow Error");
using namespace boost::asio::ip;
boost::system::error_code sock_error;
//try to connect
devSocket->connect(tcp::endpoint(address::from_string(/*[connect settings ip]*/), /*[connect settings port]*/), sock_error);
if(sock_error.value() > 0) {
//[work with error]
devSocket->close();
}
else {
//[res code ok]
}
return res;
}
//---------------------------------------------------------------------------
ERes SocketWorker::Disconnect()
{
if (devSocket->is_open())
{
boost::system::error_code ec;
devSocket->shutdown(bi::tcp::socket::shutdown_send, ec);
devSocket->close();
}
return ERes(LGS_RESULT_OK, "OK");
}
//---------------------------------------------------------------------------
//query any cmd
void SocketWorker::QueryCommand(const ProtoCmd cmd, DataReceiverType dataClb)
{
sendCmd = std::move(cmd); //store command
if (sendCmd .CommandLength() > 0)
{
ba::async_write(*devSocket.get(), ba::buffer(sendCmd.Data(), sendCmd.Length()),
b::bind(&SocketWorker::HandleSocketWrite,
this, ba::placeholders::error, dataClb));
}
else
{
cerr << "Send command error: nothing to send" << endl;
}
}
//---------------------------------------------------------------------------
// boost socket handlers
void SocketWorker::HandleSocketWrite(const b::system::error_code& error,
DataReceiverType dataClb)
{
if (error)
{
cerr << "Send cmd error: " << error.message() << endl;
//[send error to other place]
return;
}
//start reading header of answer (lw_service_proto::headerSize == 3 bytes)
ba::async_read(*devSocket.get(),
ba::buffer(rcvAnsw.Data(), lw_service_proto::headerSize),
b::bind(&SocketWorker::HandleSockReadHeader,
this, ba::placeholders::error, dataClb));
}
//---------------------------------------------------------------------------
//handler for read header
void SocketWorker::HandleSockReadHeader(const b::system::error_code& error, DataReceiverType dataClb)
{
if (error)
{
//[error working]
return;
}
//decode header (check preambule and get full packet size) and read answer payload
if (rcvAnsw.DecodeHeaderAndGetCmdSize())
{
ba::async_read(*devSocket.get(),
ba::buffer(rcvAnsw.Answer(), rcvAnsw.AnswerLength()),
b::bind(&SocketWorker::HandleSockReadBody,
this, ba::placeholders::error, dataClb));
}
}
//---------------------------------------------------------------------------
//handler for andwer payload
void SocketWorker::HandleSockReadBody(const b::system::error_code& error, DataReceiverType dataClb)
{
//if no error - send anwser to 'master'
if (!error){
if (dataClb != nullptr)
dataClb(rcvAnsw);
}
else{
//[error process]
//here i got EOF in release mode
}
}
};
Device worker
class DeviceWorker
{
private:
const static int LW_QUERY_TIME = 200;
LWDeviceSocketWorker sockWorker;
ba::io_service& timerIOService;
typedef std::shared_ptr<ba::deadline_timer> TimerPtr;
TimerPtr queryTimer;
bool queryCycleWorking;
//[other definitions]
public:
ERes DeviceWorker::Connect()
{
ERes intRes = sockWorker.Connect(/*[connect settings here]*/);
if(intRes != LGS_RESULT_OK) {
//[set result to error]
}
else {
//[set result to success]
//start "query cycle"
StartNewCycleQuery();
}
return intRes;
}
//---------------------------------------------------------------------------
ERes DeviceWorker::Disconnect()
{
return sockWorker.Disconnect();
}
//---------------------------------------------------------------------------
void DeviceWorker::StartNewCycleQuery()
{
queryCycleWorking = true;
//start timer
queryTimer = make_shared<ba::deadline_timer>(timerIOService, bt::milliseconds(LW_QUERY_TIME));
queryTimer->async_wait(boost::bind(&DeviceWorker::HandleQueryTimer,
this, boost::asio::placeholders::error));
}
//---------------------------------------------------------------------------
void DeviceWorker::StopCycleQuery()
{
//kill timer
if (queryTimer)
queryTimer->cancel();
queryCycleWorking = false;
}
//---------------------------------------------------------------------------
//timer handler
void DeviceWorker::HandleQueryTimer(const b::system::error_code& error)
{
if (!error)
{
ProtoCmd cmd;
//query for first value
cmd.EncodeCommandCore(lw_service_proto::cmdGetAlarm, 1);
sockWorker.QueryCommand(cmd, boost::bind(&DeviceWorker::ReceiveAlarmCycle,
this, _1));
}
}
//---------------------------------------------------------------------------
//receive first value
void DeviceWorker::ReceiveAlarmCycle(ProtoAnswer adata)
{
//check and fix last bytes (remove \r\n from some commands)
adata.CheckAndFixFooter();
//[working with answer]
if (queryCycleWorking)
{
//query for second value
ProtoCmd cmd;
cmd.EncodeCommandCore(lw_service_proto::cmdGetEnergyLevel, 1);
sockWorker.QueryCommand(cmd, b::bind(&DeviceWorker::ReceiveEnergyCycle,
this, _1));
}
}
//---------------------------------------------------------------------------
//receive second value
void DeviceWorker::ReceiveEnergyCycle(ProtoAnswer edata)
{
//check and fix last bytes (remove \r\n from some commands)
edata.CheckAndFixFooter();
//[working with second value]
//start new "query cycle"
if (queryCycleWorking)
StartNewCycleQuery();
}
};
Any ideas are welcome :)
edit:
After several test I see anower picture:
this issue reproduce on boost 1.54 only (Debug and Release mode, Release - much more faster), with boost 1.53 no more error (maybe i poorly clean my code then rebuild first times....)
with boost 1.54 and 1 thread (instead of 4) all work well
I also spend some time with debugger and boost source and making some conclusion:
When i receive EOF my data is already fully received.
This EOF indicate that is nothing to transfer in this operation, i.e. socket result flag is 0 (no error), but boost operation flag if EOF (transfer bytes == 0)
At this moment I am forced to switch on boost 1.53...
I had the exact same problem and I am quite sure that this is a bug of boost::asio 1.54.0
Here is the bug report.
The solution is effectively to get back to 1.53, although there is a patch available for 1.54 in the bug report page.
If your application works fine with a single thread invoking io_service::run() but fails with four threads, you very likely have a race condition. This type of problem is difficult to diagnose. Generally speaking you should ensure your devSocket has at most one outstanding async_read() and async_write() operation. Your current implementation of SocketWorker::QueryCommand() unconditionally invokes async_write() which may violate the ordering assumption documented as such
This operation is implemented in terms of zero or more calls to the
stream's async_write_some function, and is known as a composed
operation. The program must ensure that the stream performs no other
write operations (such as async_write, the stream's async_write_some
function, or any other composed operations that perform writes) until
this operation completes.
The classic solution to this problem is to maintain a queue of outgoing messages. If a previous write is outstanding, append the next outgoing message to the queue. When the previous write completes, initiate the async_write() for the next message in the queue. When using multiple threads invoking io_service::run() you may need to use a strand as the linked answer does.

Intermittently no data delivered through boost::asio / io completion port

Problem
I am using boost::asio for a project where two processes on the same machine communicate using TCP/IP. One generates data to be read by the other, but I am encountering a problem where intermittently no data is being sent through the connection. I've boiled this down to a very simple example below, based on the async tcp echo server example.
The processes (source code below) start out fine, delivering data at a fast rate from the sender to the receiver. Then all of a sudden, no data at all is delivered for about five seconds. Then data is delivered again until the next inexplicable pause. During these five seconds, the processes eat 0% CPU and no other processes seem to do anything in particular. The pause is always the same length - five seconds.
I am trying to figure out how to get rid of these stalls and what causes them.
CPU usage during an entire run:
Notice how there are three dips of CPU usage in the middle of the run - a "run" is a single invocation of the server process and the client process. During these dips, no data was delivered. The number of dips and their timing differs between runs - some times no dips at all, some times many.
I am able to affect the "probability" of these stalls by changing the size of the read buffer - for instance if I make the read buffer a multiple of the send chunk size it appears that this problem almost goes away, but not entirely.
Source and test description
I've compiled the below code with Visual Studio 2005, using Boost 1.43 and Boost 1.45. I have tested on Windows Vista 64 bit (on a quad-core) and Windows 7 64 bit (on both a quad-core and a dual-core).
The server accepts a connection and then simply reads and discards data. Whenever a read is performed a new read is issued.
The client connects to the server, then puts a bunch of packets into a send queue. After this it writes the packets one at the time. Whenever a write has completed, the next packet in the queue is written. A separate thread monitors the queue size and prints this to stdout every second. During the io stalls, the queue size remains exactly the same.
I have tried to used scatter io (writing multiple packets in one system call), but the result is the same. If I disable IO completion ports in Boost using BOOST_ASIO_DISABLE_IOCP, the problem appears to go away but at the price of significantly lower throughput.
// Example is adapted from async_tcp_echo_server.cpp which is
// Copyright (c) 2003-2010 Christopher M. Kohlhoff (chris at kohlhoff dot com)
//
// Start program with -s to start as the server
#ifndef _WIN32_WINNT
#define _WIN32_WINNT 0x0501
#endif
#include <iostream>
#include <tchar.h>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/thread.hpp>
#define PORT "1234"
using namespace boost::asio::ip;
using namespace boost::system;
class session {
public:
session(boost::asio::io_service& io_service) : socket_(io_service) {}
void do_read() {
socket_.async_read_some(boost::asio::buffer(data_, max_length),
boost::bind(&session::handle_read, this, _1, _2));
}
boost::asio::ip::tcp::socket& socket() { return socket_; }
protected:
void handle_read(const error_code& ec, size_t bytes_transferred) {
if (!ec) {
do_read();
} else {
delete this;
}
}
private:
tcp::socket socket_;
enum { max_length = 1024 };
char data_[max_length];
};
class server {
public:
explicit server(boost::asio::io_service& io_service)
: io_service_(io_service)
, acceptor_(io_service, tcp::endpoint(tcp::v4(), atoi(PORT)))
{
session* new_session = new session(io_service_);
acceptor_.async_accept(new_session->socket(),
boost::bind(&server::handle_accept, this, new_session, _1));
}
void handle_accept(session* new_session, const error_code& ec) {
if (!ec) {
new_session->do_read();
new_session = new session(io_service_);
acceptor_.async_accept(new_session->socket(),
boost::bind(&server::handle_accept, this, new_session, _1));
} else {
delete new_session;
}
}
private:
boost::asio::io_service& io_service_;
boost::asio::ip::tcp::acceptor acceptor_;
};
class client {
public:
explicit client(boost::asio::io_service &io_service)
: io_service_(io_service)
, socket_(io_service)
, work_(new boost::asio::io_service::work(io_service))
{
io_service_.post(boost::bind(&client::do_init, this));
}
~client() {
packet_thread_.join();
}
protected:
void do_init() {
// Connect to the server
tcp::resolver resolver(io_service_);
tcp::resolver::query query(tcp::v4(), "localhost", PORT);
tcp::resolver::iterator iterator = resolver.resolve(query);
socket_.connect(*iterator);
// Start packet generation thread
packet_thread_.swap(boost::thread(
boost::bind(&client::generate_packets, this, 8000, 5000000)));
}
typedef std::vector<unsigned char> packet_type;
typedef boost::shared_ptr<packet_type> packet_ptr;
void generate_packets(long packet_size, long num_packets) {
// Add a single dummy packet multiple times, then start writing
packet_ptr buf(new packet_type(packet_size, 0));
write_queue_.insert(write_queue_.end(), num_packets, buf);
queue_size = num_packets;
do_write_nolock();
// Wait until all packets are sent.
while (long queued = InterlockedExchangeAdd(&queue_size, 0)) {
std::cout << "Queue size: " << queued << std::endl;
Sleep(1000);
}
// Exit from run(), ignoring socket shutdown
work_.reset();
}
void do_write_nolock() {
const packet_ptr &p = write_queue_.front();
async_write(socket_, boost::asio::buffer(&(*p)[0], p->size()),
boost::bind(&client::on_write, this, _1));
}
void on_write(const error_code &ec) {
if (ec) { throw system_error(ec); }
write_queue_.pop_front();
if (InterlockedDecrement(&queue_size)) {
do_write_nolock();
}
}
private:
boost::asio::io_service &io_service_;
tcp::socket socket_;
boost::shared_ptr<boost::asio::io_service::work> work_;
long queue_size;
std::list<packet_ptr> write_queue_;
boost::thread packet_thread_;
};
int _tmain(int argc, _TCHAR* argv[]) {
try {
boost::asio::io_service io_svc;
bool is_server = argc > 1 && 0 == _tcsicmp(argv[1], _T("-s"));
std::auto_ptr<server> s(is_server ? new server(io_svc) : 0);
std::auto_ptr<client> c(is_server ? 0 : new client(io_svc));
io_svc.run();
} catch (std::exception& e) {
std::cerr << "Exception: " << e.what() << "\n";
}
return 0;
}
So my question is basically:
How do I get rid of these stalls?
What causes this to happen?
Update: There appears to be some correlation with disk activity contrary to what I stated above, so it appears that if I start a large directory copy on the disk while the test is running this might increase the frequency of the io stalls. This could indicate that this is the Windows IO Prioritization that kicks in? Since the pauses are always the same length, that does sound somewhat like a timeout somewhere in the OS io code...
adjust boost::asio::socket_base::send_buffer_size and receive_buffer_size
adjust max_length to a larger number. Since TCP is stream oriented, don't think of it as receiving single packets. This is most likely causing some sort of "gridlock" between TCP send/receive windows.
I recently encountered a very similar sounding problem, and have a solution that works for me. I have an asynchronous server/client written in asio that sends and receives video (and small request structures), and I was seeing frequent 5 second stalls just as you describe.
Our fix was to increase the size of the socket buffers on each end, and to disable the Nagle algorithm.
pSocket->set_option( boost::asio::ip::tcp::no_delay( true) );
pSocket->set_option( boost::asio::socket_base::send_buffer_size( s_SocketBufferSize ) );
pSocket->set_option( boost::asio::socket_base::receive_buffer_size( s_SocketBufferSize ) );
It might be that only one of the above options is critical, but I've not investigated this further.