Boost::asio::async_write doesn't seem to free memory - c++

I have a cluster program using boost asio to make the network part.
I'm using async_write function to write the message from the server to the client :
boost::asio::async_write( *m_Socket,
boost::asio::buffer( iData, iSize ),
boost::bind(
&MyObject::handle_write, this,
boost::asio::placeholders::error ) );
My handle_write method :
void
MyObject::handle_write( const boost::system::error_code& error )
{
std::cout << "handle_write" << std::endl;
if (error)
{
std::cout << "Write error !" << std::endl;
m_Server->RemoveSession(this);
}
}
It seems to work well. When I use memory leak detector program, there is no leak at all.
But, my program is supposed to run many days without interuption and during test, it appears that I don't have anough memory... After some inspection, I found that my program was allocating around 0.3Mo by seconds. And with a memory validor I found that it was into boost::asio::async_write...
I checked the documentation and I think I use it in the correct way... Am I missing something ?
EDIT 1:
That is how I call the function who call async_write itself :
NetworkMessage* msg = new NetworkMessage;
sprintf(msg->Body(), "%s", iData );
m_BytesCount += msg->Length();
uint32 nbSessions = m_Sessions.size();
// Send to all clients
for( uint32 i=0; i < nbSessions; i++)
{
m_Sessions[i]->Write( msg->Data(), msg->Length() );
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
}
delete msg;
msg->Data is the data passed to async_write.

Related

ZeroMQ - subscriber is able to receive data from 1st publisher but does not receive from 2nd publisher which is up after a few loops

In following code. Publisher thread publish 5 messages then again new publisher socket is supposed to send data to subscriber but subscriber is in while(1) loop for recv() and never get message from 2nd publisher. How subscriber can connect to publisher 2 with some exception handling that subscriber 2 is trying to connect.
I tried with XPUB/XSUB, PUSH/PULL and also ZMQ_HEARTBEAT but no exception is caught. Also tried with "inproc://#1" instead of "tcp://127.0.0.1:5555". Nothing worked
#include <future>
#include <iostream>
#include <string>
#include "zmq.hpp"
#include <string>
#include <strings.h>
#include <stdint.h>
#include <chrono>
#include <thread>
void PublisherThread(zmq::context_t* ctx){
try{
std::cout << "PublisherThread: " << std::endl;
zmq::socket_t publisher(*ctx, ZMQ_PUB);
publisher.bind("tcp://127.0.0.1:5555");
int counter = 0;
while (true){
try{
publisher.send(zmq::str_buffer("A"), zmq::send_flags::sndmore);
publisher.send(zmq::str_buffer("Message in A envelope\n"));
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
publisher.send(zmq::str_buffer("B"), zmq::send_flags::sndmore);
publisher.send(zmq::str_buffer("Message in B envelope\n"));
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
publisher.send(zmq::str_buffer("C"), zmq::send_flags::sndmore);
publisher.send(zmq::str_buffer("Message in C envelope\n"));
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
if(counter == 5){
publisher.close();
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
counter = 0;
break;
}
else{
counter++;
}
}
catch(const zmq::error_t& ze){
std::cout<<"PublisherThread: catch 2:"<<ze.what()<<std::endl;
}
}
}
catch(const zmq::error_t& ze){
std::cout<<"PublisherThread: catch 1:"<<ze.what()<<std::endl;
}
try{
zmq::socket_t publisher2(*ctx, ZMQ_PUB);
publisher2.bind("tcp://127.0.0.1:5555");
int counter = 0;
while (true){
try{
// Write three messages, each with an envelope and content
publisher2.send(zmq::str_buffer("A"), zmq::send_flags::sndmore);
publisher2.send(zmq::str_buffer("Message in A envelope\n"));
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
publisher2.send(zmq::str_buffer("B"), zmq::send_flags::sndmore);
publisher2.send(zmq::str_buffer("Message in B envelope\n"));
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
publisher2.send(zmq::str_buffer("C"), zmq::send_flags::sndmore);
publisher2.send(zmq::str_buffer("Message in C envelope\n"));
std::this_thread::sleep_for(std::chrono::milliseconds(2000));
if(counter == 50){
publisher2.close();
break;
}
else{
counter++;
}
}
catch(const zmq::error_t& ze){
std::cout<<"PublisherThread: catch 4:"<<ze.what()<<std::endl;
}
}
}
catch(const zmq::error_t& ze){
std::cout<<"PublisherThread: catch 3:"<<ze.what()<<std::endl;
}
std::cout<<"PublisherThread: exiting:"<<std::endl;
void SubscriberThread1(zmq::context_t* ctx){
std::cout<< "SubscriberThread1: " << std::endl;
zmq::socket_t subscriber(*ctx, ZMQ_SUB);
subscriber.setsockopt(ZMQ_SUBSCRIBE, "A", 1);
subscriber.setsockopt(ZMQ_SUBSCRIBE, "B", 1);
subscriber.connect("tcp://127.0.0.1:5555");
while (1){
try{
zmq::message_t address;
zmq::recv_result_t result = subscriber.recv(address);
// Read message contents
zmq::message_t contents;
result = subscriber.recv(contents);
std::cout<< "Thread2: "<< std::string(static_cast<char*>(contents.data()), contents.size())<< std::endl;
}
catch(const zmq::error_t& ze){
std::cout<<"subscriber catch error:"<<ze.what()<<std::endl;
}
}
}
int main(){
zmq::context_t* zmq_ctx = new zmq::context_t();
std::thread thread1(PublisherThread, zmq_ctx);
std::thread thread2(SubscriberThread1, zmq_ctx);
thread1.join();
thread2.join();
}
Q :" How subscriber can connect to publisher 2 with some exception handling that subscriber 2 is trying to connect. "
A :
The as-is code does serially chain {_a_try_scope_of_PUB_in_an_infinite_loop_} only after which is break-broken or otherwise terminated and exited from, the same thread1-code follows to the other one, having left the former, now entering {_a_try_scope_of_PUB2_again_into_another_infinite_loop_}.
Just this fact self-explains, why a subscriber-side SUB-end never receives a single message from the second publisher, if a ZeroMQ configuration property addressed as ZMQ_LINGER silently blocks the progress of closing a first PUB-archetype socket instance, still owned by the publisher ( this LINGER-attribute was in some native-API versions documented to have a default value such that the release will never happen, if undelivered messages were still present inside the internal queue, in some cases even blocking the IP:PORT "beyond" a code exit, leaving us only to use a hardware reboot as a strategy of last resort to release a such hanging Context()-instance from infinite occupation of that port - was in python code under pyzmq-language wrapper for ZeroMQ native-API DLL, all run in Windows. Definitely not an experience one would like to have a single more time. So in newer versions of native-API, the LINGER started to have another, non-infinite waiting default value - so version dependence has to be best overcome with explicit setting the LINGER property, right after the socket-instance gets created - a sign of good engineering practice ).
Looking onto an option to let each one work from a pair of autonomous threads - a threadP1 + another PUB inside a threadP2 :
int main(){
zmq::context_t* zmq_ctx = new zmq::context_t();
std::thread threadP1( PubThread1, zmq_ctx );
std::thread threadP2( PubThread2, zmq_ctx );
std::thread threadS1( SubThread1, zmq_ctx );
threadP1.join();
threadP2.join();
threadS1.join();
}
here we have indeed both of the PUB-sides independently entering into their own try{...}-scope and trying to proceed into there present infinite while(){...}-loop ( ignoring the hidden break-branch there for the moment ), yet with a new problem here ...
#define TAKE_A_NAP std::this_thread::sleep_for( std::chrono::milliseconds( 2000 ) )
void PubThread1( zmq::context_t* ctx ){
try{
std::cout << "PubThread1: " << std::endl;
zmq::socket_t publisher( *ctx, ZMQ_PUB );
// publisher.setsockopt( ZMQ_LINGER, 0 ); // property settings
publisher.bind( "tcp://127.0.0.1:5555" ); // resource .bind()
int counter = 0;
while ( true ){
try{
publisher.send( zmq::str_buffer( "A" ),
zmq::send_flags::sndmore );
publisher.send( zmq::str_buffer( "Msg in envelope A from PUB1\n" ) );
TAKE_A_NAP;
publisher.send( zmq::str_buffer( "B" ),
zmq::send_flags::sndmore );
publisher.send( zmq::str_buffer( "Msg in envelope B from PUB1\n" ) );
TAKE_A_NAP;
publisher.send( zmq::str_buffer( "C" ),
zmq::send_flags::sndmore );
publisher.send( zmq::str_buffer( "Msg in envelope C from PUB1\n" ) );
if ( counter > 4 ){
counter = 0;
publisher.close();
TAKE_A_NAP;
break;
}
}
catch( const zmq::error_t& ze ){
std::cout << "PubThread1: catch 2:"<<ze.what()<<std::endl;
}
}
}
catch( const zmq::error_t& ze ){
std::cout << "PubThread1: catch 1:"<<ze.what()<<std::endl;
/* publisher.close(); /* a fair place
to dismantle all resources here
to avoid a hanging instance ALAP */
}
We need to somehow solve a new, only now appearing, colliding attempts to .bind()-acquire one and the same resource twice, which for obvious reasons cannot happen - so the "slower" of the pair will fail to also .bind("tcp://127.0.0.1:5555") onto an already occupied ADDRESS:PORT-resource, the faster one has already managed to acquire and use.
Using different, non-colliding ADDRESS:PORT resources is one way, using a reversed bind/connect scenario is the other.
There the SUB.bind()-s and both of the { PUB1 | PUB2 }.connect() to a single, publicly known SUB-AccessPoint's ADDRESS:PORT used for the SUB-side.
Initial ZeroMQ versions were using SUB-side topic-filtering, meaning all messages were delivered "across"-network to all SUB-s, where the topic-filter was applied after "network"-side delivery, to decide, if a particular match was found, dropping a message if not. More recent versions ( having larger RAMs and multi-core CPUs ) started to process topic-filtering ( and also all the subscription-management ) on the PUB-side. This increases the processing & RAM overheads on the PUB-side, and may run us into problems, what happens if properly subscribed to PUB1 gets closed ( throwing away all subscription-management it carried on ) and leaving us in doubts, if newly delivered PUB2, once properly .bind(), occupying the same ADDRESS:PORT location, will somehow receive the subscription-requests repeated from a SUB-side, in the newer API mode. Details & version changes indeed matter here. Best to walk on the safer side - so the old assembler practitioners' #ASSUME NOTHING directive serves best if obeyed here )
Last, but not least, a handy notice. The PUB/SUB-archetype works with topic filtering in such a way, that does not require going into multipart-message composition's complexities. The topic-filter is a pure left-to-right ASCII-matching, so using the filter simply checks left-to-right the beginning of the message, be it a multipart-message composed of { "A" | "...whatever.text.here..." } or a plain-message "A...whatever...", "ABC...whaterver...", "B884884848484848484...", "C<lf><lf><0xD0D0><cr>" and likes - simplicity helps performance and was always a part of the Zen-of-Zero.

ZeroMQ socket.recv() raised a STACK_OVERFLOW exception

if use this code in .dll, a call to a socket.recv() raised an exception STACK_OVERFLOW, but when this code compiled as .exe it works.
Why?
I run a .dll-test by "C:\windows\system32\rundll32.exe myDll.dll StartUp"
void StartUp()
{
zmq::context_t context(1);
zmq::socket_t socket(context, ZMQ_REP);
socket.bind("tcp://127.0.0.1:3456");
zmq::message_t msgIN, msgOUT("test", 4);
while (true){
socket.recv(&msgIN);
socket.send(msgOUT);
};
}
callstack :
libzmq-v120-mt-gd-4_2_2.dll!zmq::mailbox_t::recv(zmq::command_t * cmd_=0x0231f700, int timeout_=0x00000000)
libzmq-v120-mt-gd-4_2_2.dll!zmq::io_thread_t::in_event()
libzmq-v120-mt-gd-4_2_2.dll!zmq::select_t::loop()
libzmq-v120-mt-gd-4_2_2.dll!zmq::select_t::worker_routine(void * arg_=0x002f1778)
libzmq-v120-mt-gd-4_2_2.dll!thread_routine(void * arg_=0x002f17c0)
main thread callstack:
libzmq-v120-mt-gd-4_2_2.dll!zmq::signaler_t::wait(int timeout_=0xffffffff)
libzmq-v120-mt-gd-4_2_2.dll!zmq::mailbox_t::recv(zmq::command_t * cmd_=0x0019f3c0, int timeout_=0xffffffff)
libzmq-v120-mt-gd-4_2_2.dll!zmq::socket_base_t::process_commands(int timeout_, bool throttle_)
libzmq-v120-mt-gd-4_2_2.dll!zmq::socket_base_t::recv(zmq::msg_t * msg_=0x0019f628, int flags_=0x00000000)
libzmq-v120-mt-gd-4_2_2.dll!s_recvmsg(zmq::socket_base_t * s_=0x006f6c70, zmq_msg_t * msg_=0x0019f628, int flags_=0x00000000)
libzmq-v120-mt-gd-4_2_2.dll!zmq_msg_recv(zmq_msg_t * msg_=0x0019f628, void * s_=0x006f6c70, int flags_=0x00000000)
mydll.dll!zmq::socket_t::recv(zmq::message_t * msg_=0x0019f628, int flags_=0x00000000)
mydll.dll!StartUp()
Update:
this example, also crashed with the same reason. Does someone know any reasons for exception stack overflow?
zmq::context_t context(1);
zmq::socket_t socket(context, ZMQ_REP);
socket.bind("tcp://*:7712");
while (1){
Sleep(10);
}
A reverse problem-isolation MCVE:
And how did this myDll.dll-test work,
if run by C:\windows\system32\rundll32.exe myDll.dll StartUp? Post the screen outputs.
void StartUp()
{
std::cout << "INF:: ENTRY POINT ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
std::cout << "INF:: WILL SLEEP ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
Sleep( 10 );
std::cout << "INF:: SLEPT WELL ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
std::cout << "INF:: WILL RETURN ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
}
The reason of crash is SizeOfStackCommit value in OPTIONAL_HEADER rundll32 file.
It too small (0xC000), i change it to 0x100000. Now all works.
ZeroMQ objects require certain respect to work with:
there are many features under the radar, that may go wreck havoc, as you have already seen on your screen.
Best read with due care both the ZeroMQ C++ binding reference documentation plus the original ZeroMQ API ( which is often mentioned in the C++ binding either ).
Both do emphasise to never handle zmq::message_t instances directly, but via using "service"-functions ( often re-wrapped as instance methods in C++ ).
zmq::message_t messageIN,
messageOUT;
bool successFlag;
while (true){
successFlag = socket.recv( &messageIN );
assert( successFlag && "EXC: .recv( &messageIN )" );
/* The zmq_recv() function shall receive a message
from the socket referenced by the socket argument
and store it in the message referenced by the msg
argument.
Any content previously stored in msg shall be
properly deallocated.
If there are no messages available on the specified
socket the zmq_recv() function shall block
until the request can be satisfied.
*/
messageOUT.copy( messageIN );
successFlag = socket.send( messageOUT );
assert( successFlag && "EXC: .send( messageOUT )" );
/* The zmq_send() function shall queue the message
referenced by the msg argument to be sent to
the socket referenced by the socket argument.
The flags argument is a combination of the flags
defined { ZMQ_NOBLOCK, ZMQ_SNDMORE }
The zmq_msg_t structure passed to zmq_send()
is nullified during the call.
If you want to send the same message to multiple
sockets you have to copy it using (e.g.
using zmq_msg_copy() ).
A successful invocation of zmq_send()
does not indicate that the message
has been transmitted to the network,
only that it has been queued on the socket
and ØMQ has assumed responsibility for the message.
*/
};
My suspect is a reference counting, adding more and more instances, produced by a zmq::message_t message; constructor in an infinite while( true ){...}-loop, none of which has ever met it's own fair destructor. The STACK, having a physically-limited capacity and none STACK-management care inside DLL, will fail sooner or later.
zmq::message_t instances are quite an expensive toy, so a good resources-management practices ( pre-allocation, reuse, controlled destructions ) are always welcome for professional code.
Q.E.D.
Tail remarks for clarity purposes:
A bit paraphrasing Dijkstra's view on error hunting and software testing: "If I see no Error, that does not mean, there is none in the piece of code ( the less if any external functions are linked in addition to it )."
No stack allocations?
Yes, no visible ones.
ZeroMQ API puts more light into it:
"The zmq_msg_init_size() function shall allocate any resources required to store a message size bytes long and initialise the message object referenced by msg to represent the newly allocated message.
The implementation shall choose whether to store message content on the stack (small messages) or on the heap (large messages). For performance reasons zmq_msg_init_size() shall not clear the message data."
Many years, so far spent on using cross-platform distributed systems, based on ZeroMQ API since v.2.1+, has taught me lot on being careful on explicit resources control. The more once you did not develop your own language binding for the native API.
After all unsupported criticism, let's add one more citation from ZeroMQ:
This adds a view, how a proper indirect manipulation of the message_t content is done by the library C++ bindings itself, wrapped into trivial helper functions:
from zhelpers.hpp:
// Receive 0MQ string from socket and convert into string
static std::string
s_recv (zmq::socket_t & socket) {
zmq::message_t message;
socket.recv(&message);
return std::string(static_cast<char*>(message.data()), message.size());
}
// Convert string to 0MQ string and send to socket
static bool
s_send (zmq::socket_t & socket, const std::string & string) {
zmq::message_t message(string.size());
memcpy (message.data(), string.data(), string.size());
bool rc = socket.send (message);
return (rc);
}
// Sends string as 0MQ string, as multipart non-terminal
static bool
s_sendmore (zmq::socket_t & socket, const std::string & string) {
zmq::message_t message(string.size());
memcpy (message.data(), string.data(), string.size());
bool rc = socket.send (message, ZMQ_SNDMORE);
return (rc);
}

boost::asio::async_write and buffers over 65536 bytes

I have a very simple method, with the purpose of responding to an incoming message, and then closing the connection:
void respond ( const std::string message )
{
std::string str = "<?xml version=\"1.0\"?>";
Controller & controller = Controller::Singleton();
if ( auto m = handleNewMessage( _message ) )
{
auto reply = controller.FIFO( m );
str.append( reply );
}
else
str.append ( "<Error/>" );
std::size_t bytes = str.size() * sizeof( std::string::value_type );
std::cout << "Reply bytesize " << bytes << std::endl;
boost::asio::async_write(
socket_,
boost::asio::buffer( str ),
boost::bind(
&TCPConnection::handle_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
));
}
void handle_write ( const boost::system::error_code & error, size_t bytes_transferred )
{
if ( error )
{
std::cerr << "handle_write Error: " << error.message() << std::endl;
std::cerr << "handle_write Bytes sent: " << bytes_transferred << std::endl;
}
else
{
std::cerr << "handle_write Bytes sent: " << bytes_transferred << std::endl;
socket_.close();
}
}
I know the problem is that boost::asio::async_write does not complete the writing operation, because the output from the above operations is:
Reply bytesize: 354275
handle_write Bytes sent: 65536
Implying that the maximum buffer size (65536) was not enough to write the data?
Searching around Stack Overflow, I discovered that my problem is that the buffer created by the method:
boost::asio::buffer( str )
goes out of scope before the operation has a chance to finish sending all the data.
It seems like I can't use a boost::asio::mutable_buffer, but only a boost::asio::streambuf
Furthermore and more importantly, a second error complains about the actual boost::asio::async_write being passed a boost::asio::const_buffer OR boost::asio::mutable_buffer:
/usr/include/boost/asio/detail/consuming_buffers.hpp:164:5: error: no type named ‘const_iterator’ in ‘class boost::asio::mutable_buffer’
const_iterator;
^
/usr/include/boost/asio/detail/consuming_buffers.hpp:261:36: error: no type named ‘const_iterator’ in ‘class boost::asio::mutable_buffer’
typename Buffers::const_iterator begin_remainder_;
So I am left with only one choice: To use a boost::asio::streambuf
I've tried using:
boost::asio::streambuf _out_buffer;
As a class member, and then made method respond:
std::ostream os( &_out_buffer );
os << str;
boost::asio::async_write(
socket_,
_out_buffer,
boost::asio::transfer_exactly( bytes ),
boost::bind(
&TCPConnection::handle_write,
shared_from_this(),
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred
));
However, although I get no errors, not the entire data is sent!
So, I am guessing, not the entire string is written into the streambuf?
Alternatively, I would love to know what is the most elegant way to write using boost::asio::async_write, data that is larger than 65536 bytes!
Alex, you understand asio async operations wrong. Your problem is all about lifetime of buffer and socket.
The buffer has to be alive and socket opened during the whole transmition time (from asio::async_write call to handle_write callback is to be called by Asio io_service dispatcher.
To better understand how it works, consider that every time you doing some boost::asio::async_{operation} you are posting the pointer to data for operation and pointer to callback function to the job queue. And it's Asio decision when to execute your job (but of course it tries to do it as faster as possible =)). When the whole (possible big) I/O operation completes the Asio informs you using specified callback. And you can release resources then freely.
So, to get your code work you have to ensure that std::string str is still exist and _socket not closed until the handle_write callback. You can replace the stack allocated std::string str variable by some member variable in the class that agregates _socket. And move the socket_.close(); line from respond function to handle_write.
Hope, I helped you.
P.S. When you do boost::asio::buffer( str ), you don't copy content of the string but just create thin wpapper above data of string.
The code:
_out_buffer( static_cast<void*>( &str.front() ), bytes );
Is only valid when initializing _out_buffer, i.e. Before the body of your class's constructor begins.
That code equivalent to
_out_buffer.operator()( static_cast<void*>(&str.front()), bytes )
Of course there is no such operator in class mutable_buffer, and that's what the compiler is complaining about.
I think the simplest thing to do (but not the best), is to change that line to:
_out_buffer = boost::asio::mutable_buffer(
static_cast<void*>( &str.front() ),
bytes
);

boost::asio async server design

Currently I'm using design when server reads first 4 bytes of stream then read N bytes after header decoding.
But I found that time between first async_read and second read is 3-4 ms. I just printed in console timestamp from callbacks for measuring. I sent 10 bytes of data in total. Why it takes so much time to read?
I running it in debug mode but I think that 1 connection for debug is
not so much to have a 3 ms delay between reads from socket. Maybe I need
another approach to cut TCP stream on "packets"?
UPDATE: I post some code here
void parseHeader(const boost::system::error_code& error)
{
cout<<"[parseHeader] "<<lib::GET_SERVER_TIME()<<endl;
if (error) {
close();
return;
}
GenTCPmsg::header result = msg.parseHeader();
if (result.error == GenTCPmsg::parse_error::__NO_ERROR__) {
msg.setDataLength(result.size);
boost::asio::async_read(*socket,
boost::asio::buffer(msg.data(), result.size),
(*_strand).wrap(
boost::bind(&ConnectionInterface::parsePacket, shared_from_this(), boost::asio::placeholders::error)));
} else {
close();
}
}
void parsePacket(const boost::system::error_code& error)
{
cout<<"[parsePacket] "<<lib::GET_SERVER_TIME()<<endl;
if (error) {
close();
return;
}
protocol->parsePacket(msg);
msg.flush();
boost::asio::async_read(*socket,
boost::asio::buffer(msg.data(), config::HEADER_SIZE),
(*_strand).wrap(
boost::bind(&ConnectionInterface::parseHeader, shared_from_this(), boost::asio::placeholders::error)));
}
As you see unix timestamps differ in 3-4 ms. I want to understand why so many time elapse between parseHeader and parsePacket. This is not a client problem, summary data is 10 bytes, but i cant sent much much more, delay is exactly between calls. I'm using flash client version 11. What i do is just send ByteArray through opened socket. I don't sure that delays on client. I send all 10 bytes at once. How can i debug where actual delay is?
There are far too many unknowns to identify the root cause of the delay from the posted code. Nevertheless, there are a few approaches and considerations that can be taken to help to identify the problem:
Enable handler tracking for Boost.Asio 1.47+. Simply define BOOST_ASIO_ENABLE_HANDLER_TRACKING and Boost.Asio will write debug output, including timestamps, to the standard error stream. These timestamps can be used to help filter out delays introduced by application code (parseHeader(), parsePacket(), etc.).
Verify that byte-ordering is being handled properly. For example, if the protocol defines the header's size field as two bytes in network-byte-order and the server is handling the field as a raw short, then upon receiving a message that has a body size of 10:
A big-endian machine will call async_read reading 10 bytes. The read operation should complete quickly as the socket already has the 10 byte body available for reading.
A little-endian machine will call async_read reading 2560 bytes. The read operation will likely remain outstanding, as far more bytes are trying to be read than is intended.
Use tracing tools such as strace, ltrace, etc.
Modify Boost.Asio, adding timestamps throughout the callstack. Boost.Asio is shipped as a header-file only library. Thus, users may modify it to provide as much verbosity as desired. While not the cleanest or easiest of approaches, adding a print statement with timestamps throughout the callstack may help provide visibility into timing.
Try duplicating the behavior in a short, simple, self contained example. Start with the simplest of examples to determine if the delay is systamtic. Then, iteratively expand upon the example so that it becomes closer to the real-code with each iteration.
Here is a simple example from which I started:
#include <iostream>
#include <boost/array.hpp>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
#include <boost/enable_shared_from_this.hpp>
#include <boost/make_shared.hpp>
#include <boost/shared_ptr.hpp>
class tcp_server
: public boost::enable_shared_from_this< tcp_server >
{
private:
enum
{
header_size = 4,
data_size = 10,
buffer_size = 1024,
max_stamp = 50
};
typedef boost::asio::ip::tcp tcp;
public:
typedef boost::array< boost::posix_time::ptime, max_stamp > time_stamps;
public:
tcp_server( boost::asio::io_service& service,
unsigned short port )
: strand_( service ),
acceptor_( service, tcp::endpoint( tcp::v4(), port ) ),
socket_( service ),
index_( 0 )
{}
/// #brief Returns collection of timestamps.
time_stamps& stamps()
{
return stamps_;
}
/// #brief Start the server.
void start()
{
acceptor_.async_accept(
socket_,
boost::bind( &tcp_server::handle_accept, this,
boost::asio::placeholders::error ) );
}
private:
/// #brief Accept connection.
void handle_accept( const boost::system::error_code& error )
{
if ( error )
{
std::cout << error.message() << std::endl;
return;
}
read_header();
}
/// #brief Read header.
void read_header()
{
boost::asio::async_read(
socket_,
boost::asio::buffer( buffer_, header_size ),
boost::bind( &tcp_server::handle_read_header, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred ) );
}
/// #brief Handle reading header.
void
handle_read_header( const boost::system::error_code& error,
std::size_t bytes_transferred )
{
if ( error )
{
std::cout << error.message() << std::endl;
return;
}
// If no more stamps can be recorded, then stop the async-chain so
// that io_service::run can return.
if ( !record_stamp() ) return;
// Read data.
boost::asio::async_read(
socket_,
boost::asio::buffer( buffer_, data_size ),
boost::bind( &tcp_server::handle_read_data, this,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred ) );
}
/// #brief Handle reading data.
void handle_read_data( const boost::system::error_code& error,
std::size_t bytes_transferred )
{
if ( error )
{
std::cout << error.message() << std::endl;
return;
}
// If no more stamps can be recorded, then stop the async-chain so
// that io_service::run can return.
if ( !record_stamp() ) return;
// Start reading header again.
read_header();
}
/// #brief Record time stamp.
bool record_stamp()
{
stamps_[ index_++ ] = boost::posix_time::microsec_clock::local_time();
return index_ < max_stamp;
}
private:
boost::asio::io_service::strand strand_;
tcp::acceptor acceptor_;
tcp::socket socket_;
boost::array< char, buffer_size > buffer_;
time_stamps stamps_;
unsigned int index_;
};
int main()
{
boost::asio::io_service service;
// Create and start the server.
boost::shared_ptr< tcp_server > server =
boost::make_shared< tcp_server >( boost::ref(service ), 33333 );
server->start();
// Run. This will exit once enough time stamps have been sampled.
service.run();
// Iterate through the stamps.
tcp_server::time_stamps& stamps = server->stamps();
typedef tcp_server::time_stamps::iterator stamp_iterator;
using boost::posix_time::time_duration;
for ( stamp_iterator iterator = stamps.begin() + 1,
end = stamps.end();
iterator != end;
++iterator )
{
// Obtain the delta between the current stamp and the previous.
time_duration delta = *iterator - *(iterator - 1);
std::cout << "Delta: " << delta.total_milliseconds() << " ms"
<< std::endl;
}
// Calculate the total delta.
time_duration delta = *stamps.rbegin() - *stamps.begin();
std::cout << "Total"
<< "\n Start: " << *stamps.begin()
<< "\n End: " << *stamps.rbegin()
<< "\n Delta: " << delta.total_milliseconds() << " ms"
<< std::endl;
}
A few notes about the implementation:
There is only one thread (main) and one asynchronous chain read_header->handle_read_header->handle_read_data. This should minimize the amount of time a ready-to-run handler spends waiting for an available thread.
To focus on boost::asio::async_read, noise is minimized by:
Using a pre-allocated buffer.
Not using shared_from_this() or strand::wrap.
Recording the timestamps, and perform processing post-collection.
I compiled on CentOS 5.4 using gcc 4.4.0 and Boost 1.50. To drive the data, I opted to send 1000 bytes using netcat:
$ ./a.out > output &
[1] 18623
$ echo "$(for i in {0..1000}; do echo -n "0"; done)" | nc 127.0.0.1 33333
[1]+ Done ./a.out >output
$ tail output
Delta: 0 ms
Delta: 0 ms
Delta: 0 ms
Delta: 0 ms
Delta: 0 ms
Delta: 0 ms
Total
Start: 2012-Sep-10 21:22:45.585780
End: 2012-Sep-10 21:22:45.586716
Delta: 0 ms
Observing no delay, I expanded upon the example by modifying the boost::asio::async_read calls, replacing this with shared_from_this() and wrapping the ReadHandlerss with strand_.wrap(). I ran the updated example and still observed no delay. Unfortunately, that is as far as I could get based on the code posted in the question.
Consider expanding upon the example, adding in a piece from the real implementation with each iteration. For example:
Start with using the msg variable's type to control the buffer.
Next, send valid data, and introduce parseHeader() and parsePacket functions.
Finally, introduce the lib::GET_SERVER_TIME() print.
If the example code is as close as possible to the real code, and no delay is being observed with boost::asio::async_read, then the ReadHandlers may be ready-to-run in the real code, but they are waiting on synchronization (the strand) or a resource (a thread), resulting in a delay:
If the delay is the result of synchronization with the strand, then consider Robin's suggestion by reading a larger block of data to potentially reduce the amount of reads required per-message.
If the delay is the result of waiting for a thread, then consider having an additional thread call io_service::run().
One thing that makes Boost.Asio awesome is using the async feature to the fullest. Relying on a specific number of bytes read in one batch, possibly ditching some of what could already been read, isn't really what you should be doing.
Instead, look at the example for the webserver especially this: http://www.boost.org/doc/libs/1_51_0/doc/html/boost_asio/example/http/server/connection.cpp
A boost triboolean is used to either a) complete the request if all data is available in one batch, b) ditch it if it's available but not valid and c) just read more when the io_service chooses to if the request was incomplete. The connection object is shared with the handler through a shared pointer.
Why is this superior to most other methods? You can possibly save the time between reads already parsing the request. This is sadly not followed through in the example but idealy you'd thread the handler so it can work on the data already available while the rest is added to the buffer. The only time it's blocking is when the data is incomplete.
Hope this helps, can't shed any light on why there is a 3ms delay between reads though.

async_receive_from stops receiving after a few packets under Linux

I have a setup with multiple peers broadcasting udp packets (containing images) every 200ms (5fps).
While receiving both the local stream as external streams works fine under Windows, the same code (except for the socket->cancel(); in Windows XP, see comment in code) produces rather strange behavior under Linux:
The first few (5~7) packets sent by another machine (when this machine starts streaming) are received as expected;
After this, the packets from the other machine are received after irregular, long intervals (12s, 5s, 17s, ...) or get a time out (defined after 20 seconds). At certain moments, there is again a burst of (3~4) packets received as expected.
The packets sent by the machine itself are still being received as expected.
Using Wireshark, I see both local as external packets arriving as they should, with correct time intervals between consecutive packages. The behavior also presents itself when the local machine is only listening to a single other stream, with the local stream disabled.
This is some code from the receiver (with some updates as suggested below, thanks!):
Receiver::Receiver(port p)
{
this->port = p;
this->stop = false;
}
int Receiver::run()
{
io_service io_service;
boost::asio::ip::udp::socket socket(
io_service,
boost::asio::ip::udp::endpoint(boost::asio::ip::udp::v4(),
this->port));
while(!stop)
{
const int bufflength = 65000;
int timeout = 20000;
char sockdata[bufflength];
boost::asio::ip::udp::endpoint remote_endpoint;
int rcvd;
bool read_success = this->receive_with_timeout(
sockdata, bufflength, &rcvd, &socket, remote_endpoint, timeout);
if(read_success)
{
std::cout << "read succes " << remote_endpoint.address().to_string() << std::endl;
}
else
{
std::cout << "read fail" << std::endl;
}
}
return 0;
}
void handle_receive_from(
bool* toset, boost::system::error_code error, size_t length, int* outsize)
{
if(!error || error == boost::asio::error::message_size)
{
*toset = length>0?true:false;
*outsize = length;
}
else
{
std::cout << error.message() << std::endl;
}
}
// Update: error check
void handle_timeout( bool* toset, boost::system::error_code error)
{
if(!error)
{
*toset = true;
}
else
{
std::cout << error.message() << std::endl;
}
}
bool Receiver::receive_with_timeout(
char* data, int buffl, int* outsize,
boost::asio::ip::udp::socket *socket,
boost::asio::ip::udp::endpoint &sender_endpoint, int msec_tout)
{
bool timer_overflow = false;
bool read_result = false;
deadline_timer timer( socket->get_io_service() );
timer.expires_from_now( boost::posix_time::milliseconds(msec_tout) );
timer.async_wait( boost::bind(&handle_timeout, &timer_overflow,
boost::asio::placeholders::error) );
socket->async_receive_from(
boost::asio::buffer(data, buffl), sender_endpoint,
boost::bind(&handle_receive_from, &read_result,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred, outsize));
socket->get_io_service().reset();
while ( socket->get_io_service().run_one())
{
if ( read_result )
{
timer.cancel();
}
else if ( timer_overflow )
{
//not to be used on Windows XP, Windows Server 2003, or earlier
socket->cancel();
// Update: added run_one()
socket->get_io_service().run_one();
}
}
// Update: added run_one()
socket->get_io_service().run_one();
return read_result;
}
When the timer exceeds the 20 seconds, the error message "Operation canceled" is returned, but it is difficult to get any other information about what is going on.
Can anyone identify a problem or give me some hints to get some more information about what is going wrong? Any help is appreciated.
Okay, what you're doing is that when you call receive_with_timeout, you're setting up the two asynchronous requests (one for the recv, one for the timeout). When the first one completes, you cancel the other.
However, you never invoke ioservice::run_one() again to allow it's callback to complete. When you cancel an operation in boost::asio, it invokes the handler, usually with an error code indicating that the operation has been aborted or canceled. In this case, I believe you have a handler dangling once you destroy the deadline service, since it has a pointer onto the stack for it to store the result.
The solution is to call run_one() again to process the canceled callback result prior to exiting the function. You should also check the error code being passed to your timeout handler, and only treat it as a timeout if there was no error.
Also, in the case where you do have a timeout, you need to execute run_one so that the async_recv_from handler can execute, and report that it was canceled.
After a clean installation with Xubuntu 12.04 instead of an old install with Ubuntu 10.04, everything now works as expected. Maybe it is because the new install runs a newer kernel, probably with improved networking? Anyway, a re-install with a newer version of the distribution solved my problem.
If anyone else gets unexpected network behavior with an older kernel, I would advice to try it on a system with a newer kernel installed.