BOOST ASIO multi-io_service RPC framework design RFC - c++

I am working on a RPC framework, I want to use a multi io_service design to decouple the io_objects that perform the IO (front-end) from the the threads that perform the RPC work (the back-end).
The front-end should be single threaded and the back-end should have a thread pool. I was considering a design to get the front-end and back-end to synchronise using a condition variables. However, it seems boost::thread and boost::asio do not comingle --i.e., it seems condition variable async_wait support is not available. I have a question open on this matter here.
It occured to me that io_service::post() might be used to synchronise the two io_service objects. I have attached a diagram below, I just want to know if I understand the post mechanism correctly, and weather this is a sensible implementation.

I assume that you use "a single io_service and a thread pool calling io_service::run()"
Also I assume that your frond-end is single-threaded just to avoid a race condition writing from multiple threads to the same socket.
The same goal can be achieved using io_service::strand (tutorial).Your front-end can be MT synchronized by io_service::strand. All posts from back-end to front-end (and handlers from front-end to front-end like handle_connect etc.) should be wrapped by strand, something like this:
back-end -> front-end:
io_service.post(front_end.strand.wrap(
boost::bind(&Front_end::send_response, front_end_ptr)));
or front-end -> front-end:
socket.async_connect(endpoint, strand.wrap(
boost::bind(&Front_end::handle_connect, shared_from_this(),
boost::asio::placeholders::error)));
And all posts from front-end to back-end shouldn't be wrapped by strand.

If you back-end is a thread pool calling any of the io_service::run(), io_service::run_one(), io_service::poll(), io_service::poll_one() functions and your handler(s) require access to shared resources then you still have to take care to lock those shared resources somehow in the handler's themselves.
Given the limited amount of information posted in the question, I would assume this would work fine given the caveat above.
However, when posting there is some measurable overhead for setting up the necessary completion ports and waiting -- overhead you could avoid using a different implementation of your back end "queue".
Without knowing the exact details of what you need to accomplish, I would suggest that you look into thread building blocks for pipelines or perhaps more simply for a concurrent queue.

Related

Boost::Beast Non Blocking Read for Websockets?

We have an app that is entirely synchronous, and will always be because it is basically a command line interpreter to send low level commands to our hardware, and you cant have two commands going to the hardware at the same time. I will only ever have 1 client socket for this configuration operating in a synchronous manner, one command to the server, it talks to hardware, and sends value back to client, but as far as i see it currently async_read is the only way to do non blocking reads.
What is the best way to get a non blocking read/write via Beast? For example in TCP and Serial in Windows you have ways to peek into the buffer to see if data is ready to be accessed, and if there is you can issue your read command knowing it wont block because data is there. Not sure if I am just missing this functionality in Beast, although i will say having such functionality if possible would be nice.
Anyways so based on this i have a question
First, can I take the Coroutine example and instead of using yield, to create and pass it a read_handler function?
I've taken the coroutine example, and built the functions into my class, and used the exact same read_handler from this thread answer.
How to pass read handler to async_read for Beast websocket?
It compiles as he says, but setting a break point never triggers when data is received.
I dont really need the full async functionality like the async example, pushing it into different threads, in fact that makes my life more difficult because the rest of the app is not async. And because we allow input from various sources(keyboard/TCP/Serial/File), we cant block waiting for data.
What is the best way to get a non blocking read/write via Beast?
Because of the way the websocket stream is implemented, it is not possible to support non-blocking socket modes.
can I take the Coroutine example and instead of using yield, to create and pass it a read_handler function?
If you want to use completion handlers, I would suggest that instead of starting with the coroutine example you start with one of the asynchronous examples, since these are already written to use completion handlers.
Coroutines have blocking semantics, while completion handlers do not. If you try to use the coroutine example and replace the yield expression with a completion handler, the call to the initiating function will not block the way it does when using coroutines. And you should not use spawn. You said that the coroutine example is much easier, probably this is because it resembles synchronous code. If you want that ease of writing and understanding, then you have to use coroutines. Code using completion handlers will exhibit the "inversion of control" typically associated with callbacks. This is inherent to how they work and not something you can change by just starting with code that uses coroutines and changing the completion token.

Two threads using the same websock handle- does it cause any issue?

We are having a C++ application to send and receive WebSocket messages
One thread to send the message (using WinHttpWebSocketSend)
the second thread to receive (using WinHttpWebSocketReceive)
But the same WebSocket handle is used across these 2 threads. Will it cause any problems? I don't know if we have to handle it another way. It works in our application - we are able to send and receive messages - but I don't know if it will have any problem in the production environment. Any one has better ideas?
Like most platforms, nearly all Windows API system calls do not provide thread barriers beyond preventing simultaneous access to the key parts of the kernel. While I could not say for sure (the documentation doesn't seem to answer your explicit question) I would be surprised if the WinHTTP API provides barriers that prevent multiple threads from stepping on each other (so to speak)--particularly because it's really just a "helper" API that uses the somewhat lower level Winsock stuff directly--and I would take it upon myself to implement the necessary barriers.
I'm also wondering why you're using threads in this manner to begin with. I know essentially nothing about the WinHTTP API, but I did notice WINHTTP_OPTION_ASSURED_NON_BLOCKING_CALLBACKS which leads me to believe that you can implement an asynchronous approach which would prevent any thread-safety issues to begin with (and probably be much faster and memory efficient).
It appears that the callback mechanism for WinHTTP is rather expressive. See WINHTTP_STATUS_CALLBACK. Presumably, you can simply use non-blocking operation, create an event listener, and associate the connection handle with dwContext. No threads involved.

boost::asio starting different services in threads?

Seems like all the examples always show running the same io_service in all threads.
Can you start multiple io_services? Here is what I would like to do:
Start io_service A in the main thread for handling user input...
Start another io_service B in another thread that then can start a bunch of worker
threads all sharing io_service B.
Users on io_service A can "post" work on io_service B so that it gets done on the worker pool but no work is to be done on io_service A, i.e. the main thread.
Is this possible? Does this make sense?
Thanks
In my experience, it really depends on the application if an io_service per cpu or one per process is better performing. There was a discussion on the asio-users mailing list a few years ago on this very topic.
The Boost.Asio documentation has some great examples showing these two techniques in the HTTP Server 2 and HTTP Server 3 examples. But keep in mind the second HTTP server just shows how to use this technique, not when or why to use it. Those questions will need to be answered by profiling your application.
In general, you should use the following order when creating applications using Boost.Asio
Single threaded
Thread pool with a single io_service
Multiple io_service objects with some sort of CPU affinity
Good question!
Yes, it is possible for one. In an application I'm currently working on I have broken up the application into separate components responsible for different aspects of the system. Each component runs in its own thread, has its own set of timers, does its own network I/O using asio. From a testability/design perspective, it seems more clean to me, since no component can interfere with another, but I stand to be corrected. I suppose I could rewrite everything passing in the io service as a parameter, but currently haven't found the need to do so.
So coming back to your question, you can do whatever you want, IMO it's more a case of try it out and change it if you run into any issues.
Also, you might want to take a look at what Sam Miller pointed out in a different post WRT handling user input ( that is if you're using a console): https://stackoverflow.com/questions/5210796/boost-asio-how-to-write-console-server

boost asio multithreaded tcp server with synchronous I/O on a given thread

Basically, what I'm trying to achieve is to implement a generic multithreaded TCP server that can handle arbitrary requests for usage by 2 different servers with slightly different needs.
My requirements are:
A request cannot begin to be processed until an entire initial request has been received. (Essentially, I have a request header of a fixed size that among other things, includes the size of the entire request).
Handling a request may result in multiple response messages to the requesting client. I.E., normally, requests can be handled in a single response, but at times, in response to long running database transactions, I need to ping back to the client, letting them know that I'm still working and to not time out the connection.
To achieve this, I've been following fairly closely the HTTP server example #2 from boost v1.44. In general, the example has worked for simple cases. What I've noticed is that when I scale up to handling multiple requests concurrently, the changes I've made have somehow resulted in all request being handled serially, by a single thread. Obviously, I'm doing something wrong.
I cannot post the entirety of the actual code I'm using, due to employer restrictions, but suffice it to say, I've kept the async calls to accept new connections, but have replaced the async read/writes with synchronous calls. If there are specific pieces that you think you need to see, I can see what I can do.
Essentially, what I'm looking for are pointers in how to use boost::asio for a multithreaded TCP server where individual connections are handled by a single thread with synchronous I/O. Again, keep in mind, my abstraction is based upon the http server example #2 (one io_service per CPU), but I am flexible to altern
The Boost.Asio documentation suggests using a single io_service per application, and invoking io_service::run from a pool of threads.
It's also not obvious to me why you cannot use asynchronous read and write combined with deadline_timer objects to periodically ping your clients. Such a design will almost certainly scale better than a thread-per-connectiong using synchronous reads and writes.
Some diagnostics: can you print the value of io_service_pool_.get_io_service() before using it in the following code?
// from server.cpp
void server::handle_accept(const boost::system::error_code& e)
{
if (!e)
{
new_connection_->start();
new_connection_.reset(new connection(
io_service_pool_.get_io_service(), request_handler_));
acceptor_.async_accept(new_connection_->socket(),
boost::bind(&server::handle_accept, this,
boost::asio::placeholders::error));
}
}
You'll need to store it in a temporary before passing it to new_connection_.reset(); that is, don't call get_io_service() twice for this test.
We first must make sure you're getting a new io_service.
If you are doing lots of synchronous I/O, your concurrency is limited to the number of threads you have. I would suggest having one io_service for all your asynchronous I/O (ie: all the comms, timers) as you have now, and then decide on how to deal with the synchronous I/O.
For the synchronous I/O you need to decide what your peak concurrency will be. Because it is synchronous and it is I/O, you will want more threads that CPUs, and the decision will be based on how much I/O concurrency you want. Use a separate io_service, and then use io_service::dispatch() to distribute work into the threads doing the synchronous workload.
Doing it this way avoids the problem of a blocking I/O call stopping processing on other asynchronous events.

Is Perforce's C++ P4API thread-safe?

Simple question - is the C++ API provided by Perforce thread-safe? There is no mention of it in the documentation.
By "thread-safe" I mean for server requests from the client. Obviously there will be issues if I have multiple threads trying to set client names and such on the same connection.
But given a single connection object, can I have multiple threads fetching changelists, getting status, translating files through a p4 map, etc.?
Late answer, but... From the release notes themselves:
Known Limitations
The Perforce client-server protocol is not designed to support
multiple concurrent queries over the same connection. For this
reason, multi-threaded applications using the C++ API or the
derived APIs (P4API.NET, P4Perl, etc.) should ensure that a
separate connection is used for each thread or that only one
thread may use a shared connection at a time.
It does not look like the client object has thread affinity, so in order to share a connection between threads, one just has to use a mutex to serialize the calls.
If the documentation doesn't mention it, then it is not safe.
Making something thread-safe in any sense is often difficult and may result in a performance penalty because of the addition of locks. It wouldn't make sense to go through the trouble and then not mention it in the documentation.