Efficiency and timeout for network servers

Efficiency and timeout for network servers - c++

I'm in a situation where I have to "ping" [not ICMP] (by ping I mean, use an application protocol I made in order to signal multiple sockets to see if they have died or not [similar to a watchdog timer]).
Since I was limited to asynchronous selecting in my library (bound to a window message loop), I decided to improve it's efficiency by instead of receiving the data directly by the GUI messages - to forward it to a threadpool via a data structure and let a queue of threads handle it.
Right, my initial idea was to use 2 semaphores - one to handle a blocking queue (of IO requests) and another semaphore to handle all the timeout pings.
Does this seem like a reasonable idea? Is there a better solution; perhaps a timer, mutex or something else?
A second question that I might ask would be - apart from a synchronization object, is there any other way I can create a blocking container? I'm not accepting Sleep(1) solutions by the way.
Thank you.

Related

What's the most efficient way to async send data while async receiving with 0MQ?

I've got a ROUTER/DEALER setup where both ends need to be able to receive and send data asynchronously, as soon as it's available. The model is pretty much 0MQ's async C++ server: http://zguide.zeromq.org/cpp:asyncsrv
Both the client and the server workers poll, when there's data available they call a callback. While this happens, from another thread (!) I'm putting data in a std::deque. In each poll-forever thread, I check the deque (under lock), and if there are items there, I send them out to the specified DEALER id (the id is placed in the queue).
But I can't help thinking that this is not idiomatic 0MQ. The mutex is possibly a design problem. Plus, memory consumption can probably get quite high if enough time passes between polls (and data accumulates in the deque).
The only alternative I can think of is having another DEALER thread connect to an inproc each time I want to send out data, and just have it send it and exit. However, this implies a connect per item of data sent + construction and destruction of a socket, and it's probably not ideal.
Is there an idiomatic 0MQ way to do this, and if so, what is it?

I dont fully understand your design but I do understand your concern about using locks.
In most cases you can redesign your code to remove the use of locks using zeromq PAIR sockets and inproc.
Do you really need a std::deque? If not you could just use a zerom queue as its just a queue that you can read/write from from different threads using sockets.
If you really need the deque then encapsulate it into its own thread (a class would be nice) and make its API (push etc) accessible via inproc sockets.
So like I said before I may be on the wrong track but in 99% of cases I have come across you can always remove the locks completely with some ZMQ_PAIR/inproc if you need signalling.

0mq queue has limited buffer size and it can be controlled. So memory issue will get to some point and then dropping data will occur. For that reason you may consider using conflate option leaving only most recent data in queue.
In a case of single server and communication within single machine with many threads I suggest using publish/subscribe model where with conflate option you will receive new data as soon as you read buffer and won't have to worry about memory. And it removes blocking queue problem.
As for your implementation you are quite right, it is not best design but it is quite unavoidable. I suggest checking question Access std::deque from 3 threads while it answers your problem, it may not be the best approach.

boost ASIO and message passing between thread

I am working on designing a websocket server which receives a message and saves it to an embedded database. For reading the messages I am using boost asio. To save the messages to the embedded database I see a few options in front of me:
Save the messages synchronously as soon as I receive them over the same thread.
Save the messages asynchronously on a separate thread.
I am pretty sure the second answer is what I want. However, I am not sure how to pass messages from the socket thread to the IO thread. I see the following options:
Use one io service per thread and use the post function to communicate between threads. Here I have to worry about lock contention. Should I?
Use Linux domain sockets to pass messages between threads. No lock contention as far as I understand. Here I can probably use BOOST_ASIO_DISABLE_THREADS macro to get some performance boost.
Also, I believe it would help to have multiple IO threads which would receive messages in a round robin fashion to save to the embedded database.
Which architecture would be the most performant? Are there any other alternatives from the ones I mentioned?
A few things to note:
The messages are exactly 8 bytes in length.
Cannot use an external database. The database must be embedded in the running
process.
I am thinking about using RocksDB as the embedded
database.

I don't think you want to use a unix socket, which is always going to require a system call and pass data through the kernel. That is generally more suitable as an inter-process mechanism than an inter-thread mechanism.
Unless your database API requires that all calls be made from the same thread (which I doubt) you don't have to use a separate boost::asio::io_service for it. I would instead create an io_service::strand on your existing io_service instance and use the strand::dispatch() member function (instead of io_service::post()) for any blocking database tasks. Using a strand in this manner guarantees that at most one thread may be blocked accessing the database, leaving all the other threads in your io_service instance available to service non-database tasks.
Why might this be better than using a separate io_service instance? One advantage is that having a single instance with one set of threads is slightly simpler to code and maintain. Another minor advantage is that using strand::dispatch() will execute in the current thread if it can (i.e. if no task is already running in the strand), which may avoid a context switch.
For the ultimate optimization I would agree that using a specialized queue whose enqueue operation cannot make a system call could be fastest. But given that you have network i/o by producers and disk i/o by consumers, I don't see how the implementation of the queue is going to be your bottleneck.

After benchmarking/profiling I found the facebook folly implementation of MPMC Queue to be the fastest by at least a 50% margin. If I use the non-blocking write method, then the socket thread has almost no overhead and the IO threads remain busy. The number of system calls are also much less than other queue implementations.
The SPSC queue with cond variable in boost is slower. I am not sure why that is. It might have something to do with the adaptive spin that folly queue uses.
Also, message passing (UDP domain sockets in this case) turned out to be orders of magnitude slower especially for larger messages. This might have something to do with copying of data twice.

You probably only need one io_service -- you can create additional threads which will process events occurring within the io_service by providing boost::asio::io_service::run as the thread function. This should scale well for receiving 8-byte messages from clients over the network socket.
For storing the messages in the database, it depends on the database & interface. If it's multi-threaded, then you might as well just send each message to the DB from the thread that received it. Otherwise, I'd probably set up a boost::lockfree::queue where a single reader thread pulls items off and sends them to the database, and the io_service threads append new messages to the queue when they arrive.
Is that the most efficient approach? I dunno. It's definitely simple, and gives you a baseline that you can profile if it's not fast enough for your situation. But I would recommend against designing something more complicated at first: you don't know whether you'll need it at all, and unless you know a lot about your system, it's practically impossible to say whether a complicated approach would perform any better than the simple one.

void Consumer( lockfree::queue<uint64_t> &message_queue ) {
// Connect to database...
while (!Finished) {
message_queue.consume_all( add_to_database ); // add_to_database is a Functor that takes a message
cond_var.wait_for( ... ); // Use a timed wait to avoid missing a signal. It's OK to consume_all() even if there's nothing in the queue.
}
}
void Producer( lockfree::queue<uint64_t> &message_queue ) {
while (!Finished) {
uint64_t m = receive_from_network( );
message_queue.push( m );
cond_var.notify_all( );
}
}

Assuming that the constraint of using cxx11 is not too hard in your situtation, I would try to use the std::async to make an asynchronous call to the embedded DB.

C++ Fastest Way to Hit a URL

I'm trying to ping a URL on a server in the middle of my high-performance C++ application, where every millisecond is critical. I don't care about the return data from the query... I just need to send a HTTP request to a specific URL (to cause it to load), and I'm trying to find the most effective, non-blocking method to accomplish this.
My application uses Boost::ASIO, but most methods to do this seem to involve building and tearing down sockets each time (which might unfortunately be necessary), but I'm hoping there's a basic C/C++ socket one-liner that won't cause any overhead, memory leaks, blocking, etc. Just quickly open a socket, shoot the HTTP request off, and move along.
And this will need to happen thousands of times per second, so sockets and overhead is important (don't want to flood the OS).
Anyone have any advice on the most efficient way to accomplish this?
Thanks so much!

With thousands of notifications sent per second, I can't imagine opening a socket connection for each one. That would probably be too inefficient due to the overhead. So, as Casey suggested, try using a dedicated connection.
Since it sounds like you are doing quite a bit of processing on your main thread, you might consider creating a worker thread for the socket work. You will probably need to use thread synchronization objects like a mutex or critical section to single thread the code - at least when updating a container (probably a queue) from your main thread and reading it from the worker thread.

How do I force boost::asio to prioritize finishing async_write calls over running other handlers?

I am implementing a set of simple protocols using boost::asio (oblivious transfer schemes). These are CPU bound when they run. To improve efficiency, I want to try to keep both hosts working as much as possible. If host A has the choice between preforming two tasks, one of which would let host B start computation, and one which wouldn't, I want host A to pick the former.
Currently, io_service is running computationally intensive handlers before async_writes. Unless the tcp window is full (or some similar condition is blocking writing data to the socker), it's almost certainly better to finish the async_write rather than running some other handler.
I have seen asio's example of a priority queue for handlers. Is reimplementing async_write to use such a priority queue the only solution to my problem?

There's an example in the documentation describing how to attach a priority to completion handlers. You won't need to reimplement async_write, just implement your own version of the handler_priority_queue class from the example.

waiting for 2 different events in a single thread

REMOVED - reason: not really needed.
my questions are:
can I use a linux UDP socket from two different threads? answer was here
I have two different events I would like to wait for using just one thread. One of such events is the addition of an element to a stack and another is the availability of data on a socket.
I can use a boost::condition_variable.wait(lock) for the stack and boost::asio::io_service for the socket. But there is no mechanism (that I am aware of) that allows me to wait for both events at the same time (polling is out of the question). Or is it?
Is there any other alternative solution for this problem that I'm not aware of? - I'll figure this one out by myself.

New Answer
But there is no mechanism (that I am
aware of) that allows me to wait for
both events at the same time (polling
is out of the question). Or is it?
Not that I'm aware of, and not without polling... you'll need a thread to wait for each asynchronous event. You can use a blocking stack or like you said use boost::condition_variable which blocks until there is something on the stack. The boost::asio::io_service will be very useful for managing the udp sockets, but it doesn't actually give you any advantage when it comes to the event handling.
Old Answer
I'm REALLY not sure what you're trying to do... what you're saying doesn't make much sense. I'll do my best to guess what you're trying to do, but I would suggest clarifying the question.
Question:
Do I really need to use the main
thread to send the data over component
A socket or can I do it from the
new-thread? (I think the answer is no,
but I'm not sure about race conditions
on sockets)
Answer:
You don't have to use the main thread to send data over the given component's socket. Now depending on the socket library you're using there might be different restrictions: you may only be able to send data on the same thread that the socket was created, or you might be able to send data from any thread... it really depends on the implementation of your socket.
Question:
how to I wait for both events?
Answer:
You can't do two things at the same time in the same thread... with that said you have two options:
Constantly poll to see if either event has occurred (on the same thread).
Have two threads that are blocking until a desired event occurs (usually when you read from a socket it blocks if there is no data).
Given the description of your problem it's unclear what you would achieve by using boost::condition_variable and/or boost::asio::io_service. Perhaps you should give us a very simple example of code that we can follow.
Question:
Is there any other alternative
solution for this problem that I'm not
aware of?
Answer:
There are always alternative solutions out there, but it's really difficult to tell what the alternatives might be given the current description of the "problem." I think that you should edit the problem again and focus on providing very concrete examples, perhaps some pseudo code, etc.

Switch to Windows and use WaitForMultipleObjects, or get this function implemented in Linux. It's quite handy, and then you can do two things on the same thread.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js