How to measure and fix context switching bottlenecks? - c++

I have a multi-threaded socket program. I use boost threadpool (http://threadpool.sourceforge.net/) for executing tasks. I create a TCP client socket per thread in threadpool. Whenever I send large amount of data say 500KB (message size), the throughput reduces significantly. I checked my code for:
1) Waits that might cause context-switching
2) Lock/Mutexes
For example, a 500KB message is divided into multiple lines and I send each line through the socket using ::send( ).
typedef std::list< std::string > LinesListType;
// now send the lines to the server
for ( LinesListType::const_iterator it = linesOut.begin( );
it!=linesOut.end( );
++it )
{
std::string line = *it;
if ( !line.empty( ) && '.' == line[0] )
{
line.insert( 0, "." );
}
SendData( line + CRLF );
}
SendData:
void SendData( const std::string& data )
{
try
{
uint32_t bytesToSendNo = data.length();
uint32_t totalBytesSent = 0;
ASSERT( m_socketPtr.get( ) != NULL )
while ( bytesToSendNo > 0 )
{
try
{
int32_t ret = m_socketPtr->Send( data.data( ) + totalBytesSent, bytesToSendNo );
if ( 0 == ret )
{
throw;
}
bytesToSendNo -= ret;
totalBytesSent += ret;
}
catch( )
{
}
}
}
catch()
{
}
}
Send Method in Client Socket:
int Send( const char* buffer, int length )
{
try
{
int bytes = 0;
do
{
bytes = ::send( m_handle, buffer, length, MSG_NOSIGNAL );
}
while ( bytes == -1 && errno == EINTR );
if ( bytes == -1 )
{
throw SocketSendFailed( );
}
return bytes;
}
catch( )
{
}
}
Invoking ::select() before sending caused context switches since ::select could block. Holding a lock on shared mutex caused parallel threads to wait and switch context. That affected the performance.
Is there a best practice for avoiding context switches especially in network programming? I have spent at least a week trying to figure out various tools with no luck (vmstat, callgrind in valgrind). Any tools on Linux would help measuring these bottlenecks?

In general, not related to networking, you need one thread for each resource that could be used in parallel. In other words, if you have a single network interface, a single thread is enough to service the network interface. Since you don't typically just receive or send data but also do something with it, your thread then switches to consume a different resource like e.g. the CPU for computations or the IO channel to the harddisk for storage or retrieval. This task then needs to be done in a different thread, while the single network thread keeps retrieving messages from the network.
As a consequence, your approach of creating a thread for each connection seems a simple way to keep things clean and separate, but it simply doesn't scale since it involves too much unnecessary context switching. Instead, keep the networking in one place if you can. Also, don't reinvent the wheel. There are tools like e.g. zeromq out there that serve several connections, assemble whole messages from fragmented network packets and only invoke a callback when one message was completely received. And it does so performantly, so I'd suggest using this tool as a base for your communication. In addition, it provides a plethora of language bindings, so you can quickly prototype nodes using a scripting language and switch to C++ for performance lateron.
Lastly, I'm afraid that the library you are using (which does not seem to be part of Boost!) is abandonware, i.e. its development is discontinued. I'm not sure of that, but looking at the changelog, they claim that they made it compatible to Boost 1.37, which is really old. Make sure that what you are using is worth your time!

Related

How to read ZeroMQ return values from .recv() and .send() methods in c++?

I'm trying to write a c++ class for communicating between two computers via ZeroMQ.
To be able to handle errors I am trying to read the return values of the .recv()- and .send()- methods but I get the following error
error: cannot convert 'zmq::send_result_t' {aka 'zmq::detail::trivial_optional<unsigned int>'} to 'int' in assignment
ret = msocket.send(reply, zmq::send_flags::none);
The code looks like this:
Publisher::Publisher(dataHandler & mdatahandler) :datahandler(mdatahandler)
{
// construct a REP (reply) socket and bind to interface
socketState.bind("tcp://*:5555");
//socketAngles.bind("tcp://*:5556");
//socketCurrents.bind("tcp://*:5557");
}
Publisher::~Publisher()
{
socketState.close();
//socketAngles.close();
//socketCurrents.close();
}
std::string Publisher::transfer(zmq::socket_t& msocket, std::string replyString,
int receiveFlag = 0)
{
zmq::send_result_t ret = 0;
if (receiveFlag)
{
zmq::message_t receivedData;
ret = msocket.recv(receivedData, zmq::recv_flags::none);
if (verbose)
{
std::cout << "Received " << receivedData.to_string() << std::endl;
}
return receivedData.to_string();
}
zmq::message_t reply{ replyString.cbegin(), replyString.cend() };
// send the reply to the client
ret = msocket.send(reply, zmq::send_flags::none);
if (ret == -1)
{
std::cout << zmq_strerror(errno) << std::endl;
}
}
the socket is defined as
zmq::context_t context{ 1 };
zmq::socket_t socketState{ context, ZMQ_REP };
How can I reliably catch errors and is there a better way of handling errors if they occur?
Edit:I added the zmq::send_result_t but how can I do anything with it? I can't compare it to anything and I can't print it either.
zmq::recv_result_t is based on trivial_optional<unsigned int>.
trivial_optional<T> is a class template that encapsulates that it may or may not contain a value. Instances of type trivial_optional<unsigned int> are interrogated with bool trivial_optional<unsigned int>::has_value() to see if there is a value.
If there is a value it is extracted using T trivial_optional<unsigned int>::operator*() or T trivial_optional<unsigned int>::value().
zmq::recv_result_t ret(msocket.recv(receivedData, zmq::recv_flags::none));
if (ret.has_value() && (EAGAIN == ret.value()))
{
// msocket had nothing to read and recv() timed out
....
}
Q : "How can I reliably catch errors and is there a better way of handling errors if they occur?"
This can be split into two questions:
Part A: "How can I reliably catch errors"
First understand the language tools. There are exceptions related tools and best-practices and other Do-s and Don't-s. Obey them.
Part B: "a better way of handling errors"
The best way of handling errors is by avoiding them completely - this does not save the Planet ( you can read about Ms. Margaret Hamilton ( she saved lives and national pride on doing this correctly for the Apollo Guidance Computer software ) and her genuine methodology, that saves unavoidable principally colliding cases ).
The next, a lot weaker strategy is to design architectures ( then code ), that thoroughly inspects the state of the system ( return values, RTT-times and other factors ), so as to be continuously ready to handle Exception, as it happens and in full-context with the state of the system ( not to find yourself surprised as standing as uninformed as a blind person in the middle of the crossroads, once the Exception was thrown ... and it will be thrown, at some later time, so be prepared a-priori, not panicking ex-post, resorting to but chaotic ad-hoc options )
Solution :
Step 1) Understand and master the language tools.
Step 2) Understand and master the ZeroMQ tools ( No REP can ever start with .send() )
Step 3) Understand and master the published ZeroMQ API, there are all details on details needed for successful Exception handling details ( preventive error-state indications - hidden gems in { EINVAL | ETERM | ENOTSOCK | EINTR | ... } error-states, explained for each and every API call method, in due context for each one such method.
If still not convinced, at least read the fabulous Pieter Hintjens' book "Code Connected, Volume 1", there one will get the roots of what the Zen-of-Zero is all about.

Execute a new process or multiples in a program

I am wondering what the best practice is for executing new processes (programs) from a running process. To be more specific, I am implementing a C/C++ job scheduler that has to run multiple binaries while communicating with them. Is exec or fork common? Or is there any library taking care of this?
You can use popen() to spawn the processes and communicate with them. In order to handle communication with many processes from a single parent process, use select() or poll() to multiplex the reading/writing of the file descriptors given to you by popen() (you can use fileno() to turn a FILE* into an integer file descriptor).
If you want a library to abstract much of this for you, I suggest libuv. Here's a complete example program I whipped up, largely following the docs at https://nikhilm.github.io/uvbook/processes.html#spawning-child-processes:
#include <cstdio>
#include <cstdlib>
#include <inttypes.h>
#include <uv.h>
static void alloc_buffer(uv_handle_t *handle, size_t suggested_size, uv_buf_t *buf)
{
*buf = uv_buf_init((char*)malloc(suggested_size), suggested_size);
}
void echo_read(uv_stream_t *server, ssize_t nread, const uv_buf_t* buf)
{
if (nread == -1) {
fprintf(stderr, "error echo_read");
return;
}
puts(buf->base);
}
static void on_exit(uv_process_t *req, int64_t exit_status, int term_signal)
{
fprintf(stderr, "Process %d exited with status %" PRId64 ", signal %d\n",
req->pid, exit_status, term_signal);
uv_close((uv_handle_t*)req, NULL);
}
int main()
{
uv_loop_t* loop = uv_default_loop();
const int N = 3;
uv_pipe_t channel[N];
uv_process_t child_req[N];
for (int ii = 0; ii < N; ++ii) {
char* args[3];
args[0] = const_cast<char*>("ls");
args[1] = const_cast<char*>(".");
args[2] = NULL;
uv_pipe_init(loop, &channel[ii], 1);
uv_stdio_container_t child_stdio[3]; // {stdin, stdout, stderr}
child_stdio[STDIN_FILENO].flags = UV_IGNORE;
child_stdio[STDOUT_FILENO].flags = uv_stdio_flags(UV_CREATE_PIPE | UV_WRITABLE_PIPE);
child_stdio[STDOUT_FILENO].data.stream = (uv_stream_t*)&channel[ii];
child_stdio[STDERR_FILENO].flags = UV_IGNORE;
uv_process_options_t options = {};
options.exit_cb = on_exit;
options.file = "ls";
options.args = args;
options.stdio = child_stdio;
options.stdio_count = sizeof(child_stdio) / sizeof(child_stdio[0]);
int r;
if ((r = uv_spawn(loop, &child_req[ii], &options))) {
fprintf(stderr, "%s\n", uv_strerror(r));
return EXIT_FAILURE;
} else {
fprintf(stderr, "Launched process with ID %d\n", child_req[ii].pid);
uv_read_start((uv_stream_t*)&channel[ii], alloc_buffer, echo_read);
}
}
return uv_run(loop, UV_RUN_DEFAULT);
}
The above will spawn three copies of ls to print the contents of the current directory. They all run asynchronously.
Okay let's start..
There are few ways to create another parallel task from one task. Although I wouldn't name all of them as processes.
Using fork() system call
Now as you have already mentioned that fork() creates a process from your parent process. There are few good things and few bad things about fork().
Good things
fork() is able to create a completely different process & in multi-core CPU systems, it can truly achieve parallelism
fork() also creates a child process with different pid & hence it is nice if you ever want to kill that process explicitly.
wait() & waitpid() system calls are nice to make the parent wait for child.
fork generates SIGCHILD signal and with sigaction function you can make the parent wait for child without blocking it.
Bad things
fork processes do not share the same address space & hence if one process is having say a variable var, the other process cannot access directly that same var. Hence communication is a big issue.
To communicate you need to use certain IPC mechanisms like pipe, namedpipe, messageQueues or sharedMemory
Now out of these pipe, namedpipe and messageQueues can use read & write system calls and because read & write system calls are blocking system calls, you application remains synchronized but these IPCs are very slow. The only fast IPC is sharedMemory but it cannot use read & write & hence you need to use your own synchronization mechanisms, like semaphores. But implementing semaphores for bigger applications is difficult.
Here comes pthread
Now thread removes all the difficulties that are faced by fork.
It doesn't create a separate process.
It rather creates few light-weight subtasks which can run almost parallel.
They all share same address space & hence no need for any IPC.
The come with mutex which is wonderful for any synchronizations needed even for bigger applications.
Thread also don't create any process hence all threads is a part of same process and hence will have same pid.
Note: In C++, thread is a part of C++ library, not a system call.
Note 2: Boost threads in C++ are much more mature & recommended to use.
The main idea although is to know that when to use thread & when to use process.
If you need to create a sub-task which doesn't need to work with some other task but it has to work in isolation, use process; otherwise use thread.
The exec family syscalls are different. It uses your same pid. Hence if you create an application with 500 lines say, and you get a exec call at line number 250, then that exec process will be pasted on your whole process and after exec call, you program will not resume from 251 line. Also, exec calls don't flush your stdio buffers.
But yes, if you intend to create a separate process, and then use exec call to perform that task and then come out, then you are welcome to do it, but remember the IPC to store the output otherwise it is of no use
For more info on fork click here
For more info on thread click here
For boost therad click here
#John Zwinck answer is also good but I know little about select() system call but yes it is possible that way too
Edited: As # Jonathan Leffler pointed
Editing after a long: After some years I now never think of using all these SPOOKY libraries or senseless gruesome ways of parallel or should I say SEEMINGLY parallel processing. Enter coroutines, the future of CONCURRENT processing. Look at the following Go code. Sure this is possible in C/C++ too. This code would hardly be few milliseconds slower for 7.7 mil rows in database than its C/C++ thread based implementation but sever times more manageable and scalable.
package main
import (
"fmt"
"reflect"
"github.com/jinzhu/gorm"
_ "github.com/jinzhu/gorm/dialects/sqlite"
)
type AirQuality struct {
// gorm.Model
// ID uint `gorm:"column:id"`
Index string `gorm:"column:index"`
BEN string `gorm:"column:BEN"`
CH4 string `gorm:"column:CH4"`
CO string `gorm:"column:CO"`
EBE string `gorm:"column:EBE"`
MXY string `gorm:"column:MXY"`
NMHC string `gorm:"column:NMHC"`
NO string `gorm:"column:NO"`
NO2 string `gorm:"column:NO_2"`
NOX string `gorm:"column:NOx"`
OXY string `gorm:"column:OXY"`
O3 string `gorm:"column:O_3"`
PM10 string `gorm:"column:PM10"`
PM25 string `gorm:"column:PM25"`
PXY string `gorm:"column:PXY"`
SO2 string `gorm:"column:SO_2"`
TCH string `gorm:"column:TCH"`
TOL string `gorm:"column:TOL"`
Time string `gorm:"column:date; type:timestamp"`
Station string `gorm:"column:station"`
}
func (AirQuality) TableName() string {
return "AQ"
}
func main() {
c := generateRowsConcurrent("boring!!")
for row := range c {
fmt.Println(row)
}
}
func generateRowsConcurrent(msg string) <-chan []string {
c := make(chan []string)
go func() {
db, err := gorm.Open("sqlite3", "./load_testing_7.6m.db")
if err != nil {
panic("failed to connect database")
}
defer db.Close()
rows, err := db.Model(&AirQuality{}).Limit(20).Rows()
defer rows.Close()
if err != nil {
panic(err)
}
for rows.Next() {
var aq AirQuality
db.ScanRows(rows, &aq)
v := reflect.Indirect(reflect.ValueOf(aq))
var buf []string
for i := 0; i < v.NumField(); i++ {
buf = append(buf, v.Field(i).String())
}
c <- buf
}
defer close(c)
}()
return c
}

Multiple threads writing to same socket causing issues

I have written a client/server application where the server spawns multiple threads depending upon the request from client.
These threads are expected to send some data to the client(string).
The problem is, data gets overwritten on the client side. How do I tackle this issue ?
I have already read some other threads on similar issue but unable to find the exact solution.
Here is my client code to receive data.
while(1)
{
char buff[MAX_BUFF];
int bytes_read = read(sd,buff,MAX_BUFF);
if(bytes_read == 0)
{
break;
}
else if(bytes_read > 0)
{
if(buff[bytes_read-1]=='$')
{
buff[bytes_read-1]='\0';
cout<<buff;
}
else
{
cout<<buff;
}
}
}
Server Thread code :
void send_data(int sd,char *data)
{
write(sd,data,strlen(data));
cout<<data;
}
void *calcWordCount(void *arg)
{
tdata *tmp = (tdata *)arg;
string line = tmp->line;
string s = tmp->arg;
int sd = tmp->sd_c;
int line_no = tmp->line_no;
int startpos = 0;
int finds = 0;
while ((startpos = line.find(s, startpos)) != std::string::npos)
{
++finds;
startpos+=1;
pthread_mutex_lock(&myMux);
tcount++;
pthread_mutex_unlock(&myMux);
}
pthread_mutex_lock(&mapMux);
int t=wcount[s];
wcount[s]=t+finds;
pthread_mutex_unlock(&mapMux);
char buff[MAX_BUFF];
sprintf(buff,"%s",s.c_str());
sprintf(buff+strlen(buff),"%s"," occured ");
sprintf(buff+strlen(buff),"%d",finds);
sprintf(buff+strlen(buff),"%s"," times on line ");
sprintf(buff+strlen(buff),"%d",line_no);
sprintf(buff+strlen(buff),"\n",strlen("\n"));
send_data(sd,buff);
delete (tdata*)arg;
}
On the server side make sure the shared resource (the socket, along with its associated internal buffer) is protected against the concurrent access.
Define and implement an application level protocol used by the server to make it possible for the client to distinguish what the different threads sent.
As an additional note: One cannot rely on read()/write() reading/writing as much bytes as those two functions were told to read/write. It is an essential necessity to check their return value to learn how much bytes those functions actually read/wrote and loop around them until all data that was intended to be read/written had been read/written.
You should put some mutex to your socket.
When a thread use the socket it should block the socket.
Some mutex example.
I can't help you more without the server code. Because the problem is probably in the server.

Exit an infinite looping thread elegantly

I keep running into this problem of trying to run a thread with the following properties:
runs in an infinite loop, checking some external resource, e.g. data from the network or a device,
gets updates from its resource promptly,
exits promptly when asked to,
uses the CPU efficiently.
First approach
One solution I have seen for this is something like the following:
void class::run()
{
while(!exit_flag)
{
if (resource_ready)
use_resource();
}
}
This satisfies points 1, 2 and 3, but being a busy waiting loop, uses 100% CPU.
Second approach
A potential fix for this is to put a sleep statement in:
void class::run()
{
while(!exit_flag)
{
if (resource_ready)
use_resource();
else
sleep(a_short_while);
}
}
We now don't hammer the CPU, so we address 1 and 4, but we could wait up to a_short_while unnecessarily when the resource is ready or we are asked to quit.
Third approach
A third option is to do a blocking read on the resource:
void class::run()
{
while(!exit_flag)
{
obtain_resource();
use_resource();
}
}
This will satisfy 1, 2, and 4 elegantly, but now we can't ask the thread to quit if the resource does not become available.
Question
The best approach seems to be the second one, with a short sleep, so long as the tradeoff between CPU usage and responsiveness can be achieved.
However, this still seems suboptimal, and inelegant to me. This seems like it would be a common problem to solve. Is there a more elegant way to solve it? Is there an approach which can address all four of those requirements?
This depends on the specifics of the resources the thread is accessing, but basically to do it efficiently with minimal latency, the resources need to provide an API for either doing an interruptible blocking wait.
On POSIX systems, you can use the select(2) or poll(2) system calls to do that, if the resources you're using are files or file descriptors (including sockets). To allow the wait to be preempted, you also create a dummy pipe which you can write to.
For example, here's how you might wait for a file descriptor or socket to become ready or for the code to be interrupted:
// Dummy pipe used for sending interrupt message
int interrupt_pipe[2];
int should_exit = 0;
void class::run()
{
// Set up the interrupt pipe
if (pipe(interrupt_pipe) != 0)
; // Handle error
int fd = ...; // File descriptor or socket etc.
while (!should_exit)
{
// Set up a file descriptor set with fd and the read end of the dummy
// pipe in it
fd_set fds;
FD_CLR(&fds);
FD_SET(fd, &fds);
FD_SET(interrupt_pipe[1], &fds);
int maxfd = max(fd, interrupt_pipe[1]);
// Wait until one of the file descriptors is ready to be read
int num_ready = select(maxfd + 1, &fds, NULL, NULL, NULL);
if (num_ready == -1)
; // Handle error
if (FD_ISSET(fd, &fds))
{
// fd can now be read/recv'ed from without blocking
read(fd, ...);
}
}
}
void class::interrupt()
{
should_exit = 1;
// Send a dummy message to the pipe to wake up the select() call
char msg = 0;
write(interrupt_pipe[0], &msg, 1);
}
class::~class()
{
// Clean up pipe etc.
close(interrupt_pipe[0]);
close(interrupt_pipe[1]);
}
If you're on Windows, the select() function still works for sockets, but only for sockets, so you should install use WaitForMultipleObjects to wait on a resource handle and an event handle. For example:
// Event used for sending interrupt message
HANDLE interrupt_event;
int should_exit = 0;
void class::run()
{
// Set up the interrupt event as an auto-reset event
interrupt_event = CreateEvent(NULL, FALSE, FALSE, NULL);
if (interrupt_event == NULL)
; // Handle error
HANDLE resource = ...; // File or resource handle etc.
while (!should_exit)
{
// Wait until one of the handles becomes signaled
HANDLE handles[2] = {resource, interrupt_event};
int which_ready = WaitForMultipleObjects(2, handles, FALSE, INFINITE);
if (which_ready == WAIT_FAILED)
; // Handle error
else if (which_ready == WAIT_OBJECT_0))
{
// resource can now be read from without blocking
ReadFile(resource, ...);
}
}
}
void class::interrupt()
{
// Signal the event to wake up the waiting thread
should_exit = 1;
SetEvent(interrupt_event);
}
class::~class()
{
// Clean up event etc.
CloseHandle(interrupt_event);
}
You get a efficient solution if your obtain_ressource() function supports a timeout value:
while(!exit_flag)
{
obtain_resource_with_timeout(a_short_while);
if (resource_ready)
use_resource();
}
This effectively combines the sleep() with the obtain_ressurce() call.
Check out the manpage for nanosleep:
If the nanosleep() function returns because it has been interrupted by a signal, the function returns a value of -1 and sets errno to indicate the interruption.
In other words, you can interrupt sleeping threads by sending a signal (the sleep manpage says something similar). This means you can use your 2nd approach, and use an interrupt to immediately wake the thread if it's sleeping.
Use the Gang of Four Observer Pattern:
http://home.comcast.net/~codewrangler/tech_info/patterns_code.html#Observer
Callback, don't block.
Self-Pipe trick can be used here.
http://cr.yp.to/docs/selfpipe.html
Assuming that you are reading the data from file descriptor.
Create a pipe and select() for readability on the pipe input as well as on the resource you are interested.
Then when data comes on resource, the thread wakes up and does the processing. Else it sleeps.
To terminate the thread send it a signal and in signal handler, write something on the pipe (I would say something which will never come from the resource you are interested in, something like NULL for illustrating the point). The select call returns and thread on reading the input knows that it got the poison pill and it is time to exit and calls pthread_exit().
EDIT: Better way will be just to see that the data came on the pipe and hence just exit rather than checking the value which came on that pipe.
The Win32 API uses more or less this approach:
someThreadLoop( ... )
{
MSG msg;
int retVal;
while( (retVal = ::GetMessage( &msg, TaskContext::winHandle_, 0, 0 )) > 0 )
{
::TranslateMessage( &msg );
::DispatchMessage( &msg );
}
}
GetMessage itself blocks until any type of message is received therefore not using any processing (refer). If a WM_QUIT is received, it returns false, exiting the thread function gracefully. This is a variant of the producer/consumer mentioned elsewhere.
You can use any variant of a producer/consumer, and the pattern is often similar. One could argue that one would want to split the responsibility concerning quitting and obtaining of a resource, but OTOH quitting could depend on obtaining a resource too (or could be regarded as one of the resources - but a special one). I would at least abstract the producer consumer pattern and have various implementations thereof.
Therefore:
AbstractConsumer:
void AbstractConsumer::threadHandler()
{
do
{
try
{
process( dequeNextCommand() );
}
catch( const base_except& ex )
{
log( ex );
if( ex.isCritical() ){ throw; }
//else we don't want loop to exit...
}
catch( const std::exception& ex )
{
log( ex );
throw;
}
}
while( !terminated() );
}
virtual void /*AbstractConsumer::*/process( std::unique_ptr<Command>&& command ) = 0;
//Note:
// Either may or may not block until resource arrives, but typically blocks on
// a queue that is signalled as soon as a resource is available.
virtual std::unique_ptr<Command> /*AbstractConsumer::*/dequeNextCommand() = 0;
virtual bool /*AbstractConsumer::*/terminated() const = 0;
I usually encapsulate command to execute a function in the context of the consumer, but the pattern in the consumer is always the same.
Any (welln at least, most) approaches mentioned above will do the following: thread is created, then it's blocked wwiting for resource, then it's deleted.
If you're worried about efficiency, this is not a best approach when waiting for IO. On Windows at least, you'll allocate around 1mb of memory in user mode, some in kernel for just one additional thread. What if you have many such resources? Having many waiting threads will also increase context switches and slow down your program. What if resource takes longer to be available and many requests are made? You may end up with tons of waiting threads.
Now, the solution to it (again, on Windows, but I'm sure there should be something similar on other OSes) is using threadpool (the one provided by Windows). On Windows this will not only create limited amount of threads, it'll be able to detect when thread is waiting for IO and will stwal thread from there and reuse it for other operations while waitting.
See http://msdn.microsoft.com/en-us/library/windows/desktop/ms686766(v=vs.85).aspx
Also, for more fine-grained control bit still having ability give up thread when waiting for IO, see IO completion ports (I think they'll anyway use threadpool inside): http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx

Libevent writes to the socket only after second buffer_write

Libevent is great and I love it so far. However, on a echo server, the write only sends to the socket on a second write. My writing is from another thread, a pump thread that talks to a db and does some minimal data massaging.
I verified this by setting up a callback for the write:
bufferevent_setcb( GetBufferEvent(), DataAvailable, DataWritten, HandleSocketError, this );
calling bufferevent_flush( m_bufferEvent, EV_READ|EV_WRITE, BEV_NORMAL ) doesn't seem to have any effect.
Here is the setup, just in case I blew it somewhere. I have dramatically simplified the overhead in my code base in order to obtain some help. This includes initialization of sockets, my thread init, etc. This is a multi-threaded app, so there may be some problem there. I start with this:
m_LibEventInstance = event_base_new();
evthread_use_windows_threads();
m_listener = evconnlistener_new_bind( m_LibEventInstance,
OnAccept,
this,
LEV_OPT_CLOSE_ON_FREE | LEV_OPT_CLOSE_ON_EXEC | LEV_OPT_REUSEABLE,
-1,// no maximum number of backlog connections
(struct sockaddr*)&ListenAddress, socketSize );
if (!m_listener) {
perror("Couldn't create listener");
return false;
}
evconnlistener_set_error_cb( m_listener, OnSystemError );
AFAIK, this is copy and paste from samples so it should work. My OnAccept does the following:
void OnAccept( evconnlistener* listenerObj, evutil_socket_t newConnectionId, sockaddr* ClientAddr, int socklen, void* context )
{
// We got a new connection! Set up a bufferevent for it.
struct event_base* base = evconnlistener_get_base( listenerObj );
struct bufferevent* bufferEvent = bufferevent_socket_new( base, newConnectionId, BEV_OPT_CLOSE_ON_FREE );
bufferevent_setcb( GetBufferEvent(), DataAvailable, DataWritten,
HandleSocketError, this );
// We have to enable it before our callbacks will be called.
bufferevent_enable( GetBufferEvent(), EV_READ | EV_WRITE );
DisableNagle( m_connectionId );
}
Now, I simply respond to data coming in and store it in a buffer for later processing. This is a multi-threaded application, so I will process the data later, massage it, or return a response to the client.
void DataAvailable( struct bufferevent* bufferEventObj, void* arg )
{
const U32 MaxBufferSize = 8192;
MyObj* This = (MyObj*) arg;
U8 data[ MaxBufferSize ];
size_t numBytesreceived;
/* Read 8k at a time and send it to all connected clients. */
while( 1 )
{
numBytesreceived = bufferevent_read( bufferEventObj, data, sizeof( data ) );
if( numBytesreceived <= 0 ) // nothing to send
{
break;
}
if( This )
{
This->OnDataReceived( data, numBytesreceived );
}
}
}
the last thing that happens, once I look up my data, package into a buffer, and then on a threaded timeslice I do this:
bufferevent_write( m_bufferEvent, buffer, bufferOffset );
It never, ever sends the first time. To get it to send, I have to send a second buffer full of data.
This behavior is killing me and I have spent a lot of hours on it. Any ideas?
//-------------------------------------------------------
I finally gave up and used this hack instead... there just was not enough info to tell me why libevent wasn't writing to the socket. This works just fine.
int result = send( m_connectionId, (const char* )buffer, bufferOffset, 0 );
I met the problem, too! I spent one day on this problem. At last, I solved it.
When the thread you call event_base_dispatch, it will be asleep until any semaphore wakes it up. So, when it sleeps, you call bufferevent_write, the bufferevent's fd adds to the event list, but it won't be epoll until next time. So you must send semaphore to wake up the dispatch thread after you called bufferevent_write. The way you can use is set up an event bind pair socket and add it to event_base. Then send 1 byte anytime when you need to wake up the disptach thread.