Libevent writes to the socket only after second buffer_write - c++

Libevent is great and I love it so far. However, on a echo server, the write only sends to the socket on a second write. My writing is from another thread, a pump thread that talks to a db and does some minimal data massaging.
I verified this by setting up a callback for the write:
bufferevent_setcb( GetBufferEvent(), DataAvailable, DataWritten, HandleSocketError, this );
calling bufferevent_flush( m_bufferEvent, EV_READ|EV_WRITE, BEV_NORMAL ) doesn't seem to have any effect.
Here is the setup, just in case I blew it somewhere. I have dramatically simplified the overhead in my code base in order to obtain some help. This includes initialization of sockets, my thread init, etc. This is a multi-threaded app, so there may be some problem there. I start with this:
m_LibEventInstance = event_base_new();
evthread_use_windows_threads();
m_listener = evconnlistener_new_bind( m_LibEventInstance,
OnAccept,
this,
LEV_OPT_CLOSE_ON_FREE | LEV_OPT_CLOSE_ON_EXEC | LEV_OPT_REUSEABLE,
-1,// no maximum number of backlog connections
(struct sockaddr*)&ListenAddress, socketSize );
if (!m_listener) {
perror("Couldn't create listener");
return false;
}
evconnlistener_set_error_cb( m_listener, OnSystemError );
AFAIK, this is copy and paste from samples so it should work. My OnAccept does the following:
void OnAccept( evconnlistener* listenerObj, evutil_socket_t newConnectionId, sockaddr* ClientAddr, int socklen, void* context )
{
// We got a new connection! Set up a bufferevent for it.
struct event_base* base = evconnlistener_get_base( listenerObj );
struct bufferevent* bufferEvent = bufferevent_socket_new( base, newConnectionId, BEV_OPT_CLOSE_ON_FREE );
bufferevent_setcb( GetBufferEvent(), DataAvailable, DataWritten,
HandleSocketError, this );
// We have to enable it before our callbacks will be called.
bufferevent_enable( GetBufferEvent(), EV_READ | EV_WRITE );
DisableNagle( m_connectionId );
}
Now, I simply respond to data coming in and store it in a buffer for later processing. This is a multi-threaded application, so I will process the data later, massage it, or return a response to the client.
void DataAvailable( struct bufferevent* bufferEventObj, void* arg )
{
const U32 MaxBufferSize = 8192;
MyObj* This = (MyObj*) arg;
U8 data[ MaxBufferSize ];
size_t numBytesreceived;
/* Read 8k at a time and send it to all connected clients. */
while( 1 )
{
numBytesreceived = bufferevent_read( bufferEventObj, data, sizeof( data ) );
if( numBytesreceived <= 0 ) // nothing to send
{
break;
}
if( This )
{
This->OnDataReceived( data, numBytesreceived );
}
}
}
the last thing that happens, once I look up my data, package into a buffer, and then on a threaded timeslice I do this:
bufferevent_write( m_bufferEvent, buffer, bufferOffset );
It never, ever sends the first time. To get it to send, I have to send a second buffer full of data.
This behavior is killing me and I have spent a lot of hours on it. Any ideas?
//-------------------------------------------------------
I finally gave up and used this hack instead... there just was not enough info to tell me why libevent wasn't writing to the socket. This works just fine.
int result = send( m_connectionId, (const char* )buffer, bufferOffset, 0 );

I met the problem, too! I spent one day on this problem. At last, I solved it.
When the thread you call event_base_dispatch, it will be asleep until any semaphore wakes it up. So, when it sleeps, you call bufferevent_write, the bufferevent's fd adds to the event list, but it won't be epoll until next time. So you must send semaphore to wake up the dispatch thread after you called bufferevent_write. The way you can use is set up an event bind pair socket and add it to event_base. Then send 1 byte anytime when you need to wake up the disptach thread.

Related

Async C++ socket

I am trying to write an asynchronous server to handle multiple users at the same time. The server is standing in the main thread listening for receiving data, in the same thread it receives them (large images) and creates a task to process this data, which it sends to the thread pool, and itself listens to the next image. Here is the code (Handle contains data processing that is performed on another thread):
while (true) {
cv::Mat data = ReceiveImage();
m_Pool.AddTask([=]() mutable {
Handle(std::move(data));
});
}
cv::Mat UDPServer::ReceiveImage() const {
...
try {
for (int i = 0; i < sz; i += num_bytes) {
num_bytes = ReceiveData((char*)&buf[0] + i, sz - i, from);
}
}
...
}
int UDPServer::ReceiveData(char* buf, int len, sockaddr_in& from) const {
socklen_t slen = sizeof(from);
int nReceivedBytes = recvfrom(m_Socket, buf, len, 0, (sockaddr*)&from, &slen);
if (nReceivedBytes == SOCKET_ERROR) {
throw std::runtime_error(RECEIVEFROM_ERROR.data());
}
return nReceivedBytes;
}
There is a problem with this approach: while accepting data from one user, another user can send his data, which will not be accepted.
A possible solution is to accept the data on a different thread. To do this, I want to receive ONLY a signal in the main thread that data has arrived, and transfer them to another thread to receive and send them to the thread pool. Something like Probe in MPI.
How can this be implemented on C ++ sockets? I tried to find it on the internet, but nothing came of it. Or does anyone have a better solution to the problem?
TCP sockets work this way. There is a listened-to socket, call it P, and an actual communication socket, call it Q. The accept system call does this:
Q = accept(P, ...); // there are other parameters
// which are not important here
As soon as accept returns, you can launch an async task on Q, and continue listening on P. The two jobs will not interfere with each other. If another request comes why you are still grinding away on Q, accept will just return another Q for another async task.
This whole idea doesn't work all that well for UDP because there are no persistent connections. Each packet is a communication session of its own. It doesn't make a lot of sense to asynchronously read a packet from a socket. Reading is an atomic operation, and packets are short enough. You can launch an asynchronous task to process each packet's data, there's nothing wrong with that. You can try to implement asynchronous reading by polling on a socket and launching an async task that reads the data as soon as it's ready, but this won't really simplify or speed up anything.

How to stop a C++ blocking read call

I'm reading CAN-BUS traffic under SocketCAN and C++ in GNU/Linux. I've found that the read call is blocking, and I'm struggling to figure out how to stop my program properly when I don't want to keep reading.
Of course, I could hit Ctrl+C if I've invoked the program from the terminal, but the point is to find a way to do it programmatically when some condition is met (e.g., record for 5 seconds, or when some event happens, like a flag is raised). A timeout could work, or something like a signal, but I don't know how to do it properly.
// Read (blocking)
nbytes = read(s, &frame, sizeof(struct can_frame));
You don't.
Use a method like select or epoll to determine whether the socket has activity before beginning the read. Then it will not actually block.
The select/epoll call is itself blocking, but can be given a timeout so that you always have an escape route (or, in the case of epoll, the lovely epollfd for immediate triggering of a breakout).
Read is always blocking... you want to only read if data is waiting... so consider doing a poll on the socket first to see if data is available and if so THEN read it. You can loop over doing the poll until you no longer want to read anymore...
bool pollIn(int fd)
{
bool returnValue{false};
struct pollfd *pfd;
pfd = calloc(1, sizeof(struct pollfd));
pfd.fd = fd;
pfd.events = POLLIN;
int pollReturn{-1};
pollReturn = poll(pfd, 1, 0);
if (pollReturn > 0)
{
if (pfd.revents & POLLIN)
{
returnValue = true;
}
}
free(pfd);
return(returnValue);
}
The above should return if there is data waiting at the socket file descriptor.
while(!exitCondition)
{
if(pollIn(fd))
{
nbytes = read(fd, &frame, sizeof(struct can_frame));
// other stuff you need to do with your read
}
}

Using timer with zmq

I am working on a project where I have to use zmq_poll. But I did not completely understand what it does.
So I also tried to implement it:
zmq_pollitem_t timer_open(void){
zmq_pollitem_t items[1];
if( items[0].socket == nullptr ){
printf("error socket %s: %s\n", zmq_strerror(zmq_errno()));
return;
}
else{
items[0].socket = gsock;
}
items[0].fd = -1;
items[0].events = ZMQ_POLLIN;
// get a timer
items[0].fd = timerfd_create( CLOCK_REALTIME, 0 );
if( items[0].fd == -1 )
{
printf("timerfd_create() failed: errno=%d\n", errno);
items[0].socket = nullptr;
return;
}
int rc = zmq_poll(items,1,-1);
if(rc == -1){
printf("error poll %s: %s\n", zmq_strerror(zmq_errno()));
return;
}
else
return items[0];
}
I am very new to this topic and I have to modify an old existing project and replace the functions with the one of zmq. On other websites I saw examples where they used two items and the zmq_poll function in an endless loop. I have read the documentation but still could not properly understand how this works. And these are the other two functions I have implemented. I do not know if it is the correct way to implement it like this:
void timer_set(zmq_pollitem_t items[] , long msec, ipc_timer_mode_t mode ) {
struct itimerspec t;
...
timerfd_settime( items[0].fd , 0, &t, NULL );
}
void timer_close(zmq_pollitem_t items[]){
if( items[0].fd != -1 )
close(items[0].fd);
items[0].socket = nullptr;
}
I am not sure if I need the zmq_poll function because I am using a timer.
EDIT:
void some_function_timer_example() {
// We want to wait on two timers
zmq_pollitem_t items[2] ;
// Setup first timer
ipc_timer_open_(&items[0]);
ipc_timer_set_(&items[0], 1000, IPC_TIMER_ONE_SHOT);
// Setup second timer
ipc_timer_open_(&items[1]);
ipc_timer_set_(&items[1], 1000, IPC_TIMER_ONE_SHOT);
// Now wait for the timers in a loop
while (1) {
//ipc_timer_set_(&items[0], 1000, IPC_TIMER_REPEAT);
//ipc_timer_set_(&items[1], 5000, IPC_TIMER_REPEAT);
int rc = zmq_poll (items, 2, -1);
assert (rc >= 0); /* Returned events will be stored in items[].revents */
if (items [0].revents & ZMQ_POLLIN) {
// Process task
std::cout << "revents: 1" << std::endl;
}
if (items [1].revents & ZMQ_POLLIN) {
// Process weather update
std::cout << "revents: 2" << std::endl;
}
}
}
Now it still prins very fast and is not waiting. It is still waiting only in the beginning. And when the timer_set is inside the loop it waits properly, only if the waiting time is the same like: ipc_timer_set(&items[1], 1000,...) and ipctimer_set(&items[0], 1000,...)
So how do I have to change this? Or is this the correct behavior?
zmq_poll works like select, but it allows some additional stuff. For instance you can select between regular synchronous file descriptors, and also special async sockets.
In your case you can use the timer fd as you have tried to do, but you need to make a few small changes.
First you have to consider how you will invoke these timers. I think the use case is if you want to create multiple timers and wait for them. This would be typically the function in yuor current code that might be using a loop for the timer (either using select() or whatever else they might be doing).
It would be something like this:
void some_function() {
// We want to wait on two timers
zmq_pollitem items[2];
// Setup first timer
ipc_timer_open(&item[0]);
ipc_timer_set(&item[0], 1000, IPC_TIMER_ONE_REPEAT);
// Setup second timer
ipc_timer_open(&item[1]);
ipc_timer_set(&item[1], 5000, IPC_TIMER_ONE_SHOT);
// Now wait for the timers in a loop
while (1) {
int rc = zmq_poll (items, 2, -1);
assert (rc >= 0); /* Returned events will be stored in items[].revents */
}
}
Now, you need to fix the ipc_timer_open. It will be very simple - just create the timer fd.
// Takes a pointer to pre-allocated zmq_pollitem_t and returns 0 for success, -1 for error
int ipc_timer_open(zmq_pollitem_t *items){
items[0].socket = NULL;
items[0].events = ZMQ_POLLIN;
// get a timer
items[0].fd = timerfd_create( CLOCK_REALTIME, 0 );
if( items[0].fd == -1 )
{
printf("timerfd_create() failed: errno=%d\n", errno);
return -1; // error
}
return 0;
}
Edit: Added as reply to comment, since this is long:
From the documentation:
If both socket and fd are set in a single zmq_pollitem_t, the ØMQ socket referenced by socket shall take precedence and the value of fd shall be ignored.
So if you are passing the fd, you have to set socket to NULL. I am not even clear where gsock is coming from. Is this in the documentation? I couldn't find it.
And when will it break out of the while(1) loop?
This is application logic, and you have to code according to what you require. zmq_poll just keeps returning everytime one of the timer hits. In this example, every second the zmq_poll returns because the first timer (which is a repeat) keeps triggering. But at 5 seconds, it will also return because of the second timer (which is a one shot). Its up to you to decide when you exit the loop. Do you want this to go infinitely? Do you need to check for a different condition to exit the loop? Do you want to do this for say 100 times and then return? You can code whatever logic you want on top of this code.
And what kind of events are returned back
ZMQ_POLLIN since timer fds behave like readable file descriptors.

How to measure and fix context switching bottlenecks?

I have a multi-threaded socket program. I use boost threadpool (http://threadpool.sourceforge.net/) for executing tasks. I create a TCP client socket per thread in threadpool. Whenever I send large amount of data say 500KB (message size), the throughput reduces significantly. I checked my code for:
1) Waits that might cause context-switching
2) Lock/Mutexes
For example, a 500KB message is divided into multiple lines and I send each line through the socket using ::send( ).
typedef std::list< std::string > LinesListType;
// now send the lines to the server
for ( LinesListType::const_iterator it = linesOut.begin( );
it!=linesOut.end( );
++it )
{
std::string line = *it;
if ( !line.empty( ) && '.' == line[0] )
{
line.insert( 0, "." );
}
SendData( line + CRLF );
}
SendData:
void SendData( const std::string& data )
{
try
{
uint32_t bytesToSendNo = data.length();
uint32_t totalBytesSent = 0;
ASSERT( m_socketPtr.get( ) != NULL )
while ( bytesToSendNo > 0 )
{
try
{
int32_t ret = m_socketPtr->Send( data.data( ) + totalBytesSent, bytesToSendNo );
if ( 0 == ret )
{
throw;
}
bytesToSendNo -= ret;
totalBytesSent += ret;
}
catch( )
{
}
}
}
catch()
{
}
}
Send Method in Client Socket:
int Send( const char* buffer, int length )
{
try
{
int bytes = 0;
do
{
bytes = ::send( m_handle, buffer, length, MSG_NOSIGNAL );
}
while ( bytes == -1 && errno == EINTR );
if ( bytes == -1 )
{
throw SocketSendFailed( );
}
return bytes;
}
catch( )
{
}
}
Invoking ::select() before sending caused context switches since ::select could block. Holding a lock on shared mutex caused parallel threads to wait and switch context. That affected the performance.
Is there a best practice for avoiding context switches especially in network programming? I have spent at least a week trying to figure out various tools with no luck (vmstat, callgrind in valgrind). Any tools on Linux would help measuring these bottlenecks?
In general, not related to networking, you need one thread for each resource that could be used in parallel. In other words, if you have a single network interface, a single thread is enough to service the network interface. Since you don't typically just receive or send data but also do something with it, your thread then switches to consume a different resource like e.g. the CPU for computations or the IO channel to the harddisk for storage or retrieval. This task then needs to be done in a different thread, while the single network thread keeps retrieving messages from the network.
As a consequence, your approach of creating a thread for each connection seems a simple way to keep things clean and separate, but it simply doesn't scale since it involves too much unnecessary context switching. Instead, keep the networking in one place if you can. Also, don't reinvent the wheel. There are tools like e.g. zeromq out there that serve several connections, assemble whole messages from fragmented network packets and only invoke a callback when one message was completely received. And it does so performantly, so I'd suggest using this tool as a base for your communication. In addition, it provides a plethora of language bindings, so you can quickly prototype nodes using a scripting language and switch to C++ for performance lateron.
Lastly, I'm afraid that the library you are using (which does not seem to be part of Boost!) is abandonware, i.e. its development is discontinued. I'm not sure of that, but looking at the changelog, they claim that they made it compatible to Boost 1.37, which is really old. Make sure that what you are using is worth your time!

Exit an infinite looping thread elegantly

I keep running into this problem of trying to run a thread with the following properties:
runs in an infinite loop, checking some external resource, e.g. data from the network or a device,
gets updates from its resource promptly,
exits promptly when asked to,
uses the CPU efficiently.
First approach
One solution I have seen for this is something like the following:
void class::run()
{
while(!exit_flag)
{
if (resource_ready)
use_resource();
}
}
This satisfies points 1, 2 and 3, but being a busy waiting loop, uses 100% CPU.
Second approach
A potential fix for this is to put a sleep statement in:
void class::run()
{
while(!exit_flag)
{
if (resource_ready)
use_resource();
else
sleep(a_short_while);
}
}
We now don't hammer the CPU, so we address 1 and 4, but we could wait up to a_short_while unnecessarily when the resource is ready or we are asked to quit.
Third approach
A third option is to do a blocking read on the resource:
void class::run()
{
while(!exit_flag)
{
obtain_resource();
use_resource();
}
}
This will satisfy 1, 2, and 4 elegantly, but now we can't ask the thread to quit if the resource does not become available.
Question
The best approach seems to be the second one, with a short sleep, so long as the tradeoff between CPU usage and responsiveness can be achieved.
However, this still seems suboptimal, and inelegant to me. This seems like it would be a common problem to solve. Is there a more elegant way to solve it? Is there an approach which can address all four of those requirements?
This depends on the specifics of the resources the thread is accessing, but basically to do it efficiently with minimal latency, the resources need to provide an API for either doing an interruptible blocking wait.
On POSIX systems, you can use the select(2) or poll(2) system calls to do that, if the resources you're using are files or file descriptors (including sockets). To allow the wait to be preempted, you also create a dummy pipe which you can write to.
For example, here's how you might wait for a file descriptor or socket to become ready or for the code to be interrupted:
// Dummy pipe used for sending interrupt message
int interrupt_pipe[2];
int should_exit = 0;
void class::run()
{
// Set up the interrupt pipe
if (pipe(interrupt_pipe) != 0)
; // Handle error
int fd = ...; // File descriptor or socket etc.
while (!should_exit)
{
// Set up a file descriptor set with fd and the read end of the dummy
// pipe in it
fd_set fds;
FD_CLR(&fds);
FD_SET(fd, &fds);
FD_SET(interrupt_pipe[1], &fds);
int maxfd = max(fd, interrupt_pipe[1]);
// Wait until one of the file descriptors is ready to be read
int num_ready = select(maxfd + 1, &fds, NULL, NULL, NULL);
if (num_ready == -1)
; // Handle error
if (FD_ISSET(fd, &fds))
{
// fd can now be read/recv'ed from without blocking
read(fd, ...);
}
}
}
void class::interrupt()
{
should_exit = 1;
// Send a dummy message to the pipe to wake up the select() call
char msg = 0;
write(interrupt_pipe[0], &msg, 1);
}
class::~class()
{
// Clean up pipe etc.
close(interrupt_pipe[0]);
close(interrupt_pipe[1]);
}
If you're on Windows, the select() function still works for sockets, but only for sockets, so you should install use WaitForMultipleObjects to wait on a resource handle and an event handle. For example:
// Event used for sending interrupt message
HANDLE interrupt_event;
int should_exit = 0;
void class::run()
{
// Set up the interrupt event as an auto-reset event
interrupt_event = CreateEvent(NULL, FALSE, FALSE, NULL);
if (interrupt_event == NULL)
; // Handle error
HANDLE resource = ...; // File or resource handle etc.
while (!should_exit)
{
// Wait until one of the handles becomes signaled
HANDLE handles[2] = {resource, interrupt_event};
int which_ready = WaitForMultipleObjects(2, handles, FALSE, INFINITE);
if (which_ready == WAIT_FAILED)
; // Handle error
else if (which_ready == WAIT_OBJECT_0))
{
// resource can now be read from without blocking
ReadFile(resource, ...);
}
}
}
void class::interrupt()
{
// Signal the event to wake up the waiting thread
should_exit = 1;
SetEvent(interrupt_event);
}
class::~class()
{
// Clean up event etc.
CloseHandle(interrupt_event);
}
You get a efficient solution if your obtain_ressource() function supports a timeout value:
while(!exit_flag)
{
obtain_resource_with_timeout(a_short_while);
if (resource_ready)
use_resource();
}
This effectively combines the sleep() with the obtain_ressurce() call.
Check out the manpage for nanosleep:
If the nanosleep() function returns because it has been interrupted by a signal, the function returns a value of -1 and sets errno to indicate the interruption.
In other words, you can interrupt sleeping threads by sending a signal (the sleep manpage says something similar). This means you can use your 2nd approach, and use an interrupt to immediately wake the thread if it's sleeping.
Use the Gang of Four Observer Pattern:
http://home.comcast.net/~codewrangler/tech_info/patterns_code.html#Observer
Callback, don't block.
Self-Pipe trick can be used here.
http://cr.yp.to/docs/selfpipe.html
Assuming that you are reading the data from file descriptor.
Create a pipe and select() for readability on the pipe input as well as on the resource you are interested.
Then when data comes on resource, the thread wakes up and does the processing. Else it sleeps.
To terminate the thread send it a signal and in signal handler, write something on the pipe (I would say something which will never come from the resource you are interested in, something like NULL for illustrating the point). The select call returns and thread on reading the input knows that it got the poison pill and it is time to exit and calls pthread_exit().
EDIT: Better way will be just to see that the data came on the pipe and hence just exit rather than checking the value which came on that pipe.
The Win32 API uses more or less this approach:
someThreadLoop( ... )
{
MSG msg;
int retVal;
while( (retVal = ::GetMessage( &msg, TaskContext::winHandle_, 0, 0 )) > 0 )
{
::TranslateMessage( &msg );
::DispatchMessage( &msg );
}
}
GetMessage itself blocks until any type of message is received therefore not using any processing (refer). If a WM_QUIT is received, it returns false, exiting the thread function gracefully. This is a variant of the producer/consumer mentioned elsewhere.
You can use any variant of a producer/consumer, and the pattern is often similar. One could argue that one would want to split the responsibility concerning quitting and obtaining of a resource, but OTOH quitting could depend on obtaining a resource too (or could be regarded as one of the resources - but a special one). I would at least abstract the producer consumer pattern and have various implementations thereof.
Therefore:
AbstractConsumer:
void AbstractConsumer::threadHandler()
{
do
{
try
{
process( dequeNextCommand() );
}
catch( const base_except& ex )
{
log( ex );
if( ex.isCritical() ){ throw; }
//else we don't want loop to exit...
}
catch( const std::exception& ex )
{
log( ex );
throw;
}
}
while( !terminated() );
}
virtual void /*AbstractConsumer::*/process( std::unique_ptr<Command>&& command ) = 0;
//Note:
// Either may or may not block until resource arrives, but typically blocks on
// a queue that is signalled as soon as a resource is available.
virtual std::unique_ptr<Command> /*AbstractConsumer::*/dequeNextCommand() = 0;
virtual bool /*AbstractConsumer::*/terminated() const = 0;
I usually encapsulate command to execute a function in the context of the consumer, but the pattern in the consumer is always the same.
Any (welln at least, most) approaches mentioned above will do the following: thread is created, then it's blocked wwiting for resource, then it's deleted.
If you're worried about efficiency, this is not a best approach when waiting for IO. On Windows at least, you'll allocate around 1mb of memory in user mode, some in kernel for just one additional thread. What if you have many such resources? Having many waiting threads will also increase context switches and slow down your program. What if resource takes longer to be available and many requests are made? You may end up with tons of waiting threads.
Now, the solution to it (again, on Windows, but I'm sure there should be something similar on other OSes) is using threadpool (the one provided by Windows). On Windows this will not only create limited amount of threads, it'll be able to detect when thread is waiting for IO and will stwal thread from there and reuse it for other operations while waitting.
See http://msdn.microsoft.com/en-us/library/windows/desktop/ms686766(v=vs.85).aspx
Also, for more fine-grained control bit still having ability give up thread when waiting for IO, see IO completion ports (I think they'll anyway use threadpool inside): http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx