How do I use select() and gRPC to create a server? - c++

I need to use gRPC but in a single-threaded application (with additional socket channels). Naively, I'm thinking of using select() and depending on which file descriptor pops, calling gRPC to handle the message. My question is, can someone give me a rough (5-10 lines of code) outline skeleton on what I need to call after the select() pops?
Looking at Google's "hello world" example in the synchronous case implies a thread pool (which I can't use), and in the asynchronous case shows the main loop blocking -- which doesn't work for me because I need to handle other socket operations.

You can't do it, at this point (and probably ever).
One of the big weaknesses of event loops, including direct use of select()/poll() style APIs, is that they aren't composable in any natural way short of direct integration between the two.
We could theoretically add such functionality for Linux -- exporting an epoll_fd with a timerfd which becomes readable if it would be productive to call into a completion queue, but doing so would impose substantial constraints and architectural overhead on the rest of the stack just to support this usecase only on Linux. Everywhere else would require a background thread to manage that fd's readability.

This can be done using a gRPC async service along with grpc::Alarm to send any events that come from select or other polling APIs onto the gRPC completion queue. You can see an example using Epoll and gRPC together in this gist. The important functions are these two:
bool grpc_tick(grpc::ServerCompletionQueue& queue) {
void* tag = nullptr;
bool ok = false;
auto next_status = queue.AsyncNext(&tag, &ok, std::chrono::system_clock::now());
if (next_status == grpc::CompletionQueue::GOT_EVENT) {
if (ok && tag) {
static_cast<RequestProcessor*>(tag)->grpc_queue_tick();
} else {
std::cerr << "Not OK or bad tag: " << ok << "; " << tag << std::endl;
return false;
}
}
return next_status != grpc::CompletionQueue::SHUTDOWN;
}
bool tick_loops(int epoll, grpc::ServerCompletionQueue& queue) {
// Pump epoll events over to gRPC's completion queue.
epoll_event event{0};
while (epoll_wait(epoll, &event, /*maxevents=*/1, /*timeout=*/0)) {
grpc::Alarm alarm;
alarm.Set(&queue, std::chrono::system_clock::now(), event.data.ptr);
if (!grpc_tick(queue)) return false;
}
// Make sure gRPC gets at least 1 tick.
return grpc_tick(queue);
}
Here you can see the tick_loops function repeatedly calls epoll_wait until no more events are returned. For each epoll event, a grpc::Alarm is constructed with the deadline set to right now. After that, the gRPC event loop is immediately pumped with grpc_tick.
Note that the grpc::Alarm instance MUST outlive its time on the completion queue. In a real-world application, the alarm should be somehow attached to the tag (event.data.ptr in this example) so it can be cleaned up in the completion callback.
The gRPC event loop is then pumped again to ensure that any non-epoll events are also processed.
Completion queues are thread safe, so you could also put the epoll pump on one thread and the gRPC pump on another. With this setup you would not need to set the polling timeouts for each to 0 as they are in this example. This would reduce CPU usage by limiting dry cycles of the event loop pumps.

Related

boost::asio wite() API stuck while writing the data

While sending some data to client (multiple chunks of data); if the client stop reading the data after some packets, the server gets stuck on boost::asio::write() which results in unwanted behavior of the product.
We thought of shifting to async_write() and have a timer over it so that if such condition occurs, we could fallback to original good state, but due to design faults we could not use io_service (due to high concurrency) after async_write which resulted in not getting callbacks to stop the timer.
So, is there any way through which (without using io_serivce) we can unblock the write() API.
Somthing like we could execute write() API on a separate thread and terminate it through some timer. But here the question arises, is there any way through which we can clear out the boost buffers which already has some pending write data ?
Any help would be appreciated.
Thanks.
Eventually went with using boost::asio::async_write() but with io_service::poll() -> poll being non-blocking.
run() was not an option as the system is highly concurrent and read/write had to share the same io_service.
Pseudo code looks something like this:
data_to_write = size of data;
set current_bytes_transffered = 0
set timeout_occurred to false
/*
current_bytes_transffered -> obtained from async_write() callback
timeout_occurred -> obtained from a seperate timer
*/
while((data_to_write != current_bytes_transffered) || (!timeout_occurred))
{
// poll() is used instead of run() as the system
// has high concurrency and read and write operations
// shares same io_service
io_service.poll();
if(data_to_write == current_bytes_transffered)
{
// SUCCESS write logic
}
else if(timeout_occurred)
{
// timeout logic
}
}

Concurrent request processing with Boost Beast

I'm referring to this sample program from the Beast repository: https://www.boost.org/doc/libs/1_67_0/libs/beast/example/http/server/fast/http_server_fast.cpp
I've made some changes to the code to check the ability to process multiple requests simultaneously.
boost::asio::io_context ioc{1};
tcp::acceptor acceptor{ioc, {address, port}};
std::list<http_worker> workers;
for (int i = 0; i < 10; ++i)
{
workers.emplace_back(acceptor, doc_root);
workers.back().start();
}
ioc.run();
My understanding with the above is that I will now have 10 worker objects to run I/O, i.e. handle incoming connections.
So, my first question is the above understanding correct?
Assuming that the above is correct, I've made some changes to the lambda (handler) passed to the tcp::acceptor:
void accept()
{
// Clean up any previous connection.
boost::beast::error_code ec;
socket_.close(ec);
buffer_.consume(buffer_.size());
acceptor_.async_accept(
socket_,
[this](boost::beast::error_code ec)
{
if (ec)
{
accept();
}
else
{
boost::system::error_code ec2;
boost::asio::ip::tcp::endpoint endpoint = socket_.remote_endpoint(ec2);
// Request must be fully processed within 60 seconds.
request_deadline_.expires_after(
std::chrono::seconds(60));
std::cerr << "Remote Endpoint address: " << endpoint.address() << " port: " << endpoint.port() << "\n";
read_request();
}
});
}
And also in process_request():
void process_request(http::request<request_body_t, http::basic_fields<alloc_t>> const& req)
{
switch (req.method())
{
case http::verb::get:
std::cerr << "Simulate processing\n";
std::this_thread::sleep_for(std::chrono::seconds(30));
send_file(req.target());
break;
default:
// We return responses indicating an error if
// we do not recognize the request method.
send_bad_response(
http::status::bad_request,
"Invalid request-method '" + req.method_string().to_string() + "'\r\n");
break;
}
}
And here's my problem: If I send 2 simultaneous GET requests to my server, they're being processed sequentially, and I know this because the 2nd "Simulate processing" statement is printed ~30 seconds after the previous one which would mean that execution gets blocked on the first thread.
I've tried to read the documentation of boost::asio to better understand this, but to no avail.
The documentation for acceptor::async_accept says:
Regardless of whether the asynchronous operation completes immediately or not, the handler will not be >invoked from within this function. Invocation of the handler will be performed in a manner equivalent to >using boost::asio::io_service::post().
And the documentation for boost::asio::io_service::post() says:
The io_service guarantees that the handler will only be called in a thread in which the run(), >run_one(), poll() or poll_one() member functions is currently being invoked.
So, if 10 workers are in the run() state, then why would the two requests get queued?
And also, is there a way to workaround this behavior without adapting to a different example? (e.g. https://www.boost.org/doc/libs/1_67_0/libs/beast/example/http/server/async/http_server_async.cpp)
io_context does not create threads internally to execute the tasks, but rather uses the threads that call io_context::run explicitly. In the example the io_context::run is called just from one thread (main thread). So you have just one thread for task executions, which (thread) gets blocked in sleep and there is no other thread to execute other tasks.
To make this example work you have to:
Add more thread into the pool (like in the second example you referred to)
size_t const threads_count = 4;
std::vector<std::thread> v;
v.reserve(threads_count - 1);
for(size_t i = 0; i < threads_count - 1; ++i) { // add thraed_count threads into the pool
v.emplace_back([&ioc]{ ioc.run(); });
}
ioc.run(); // add the main thread into the pool as well
Add synchronization (for example, using strand like in the second example) where it is needed (at least for socket reads and writes), because now your application is multi-threaded.
UPDATE 1
Answering to the question "What is the purpose of a list of workers in the Beast example (the first one that referred) if in fact io_context is only running on one thread?"
Notice, regardless of thread count IO operations here are asynchronous, meaning http::async_write(socket_...) does not block the thread. And notice, that I explain here the original example (not your modified version). One worker here deals with one round-trip of 'request-response'. Imagine the situation. There are two clients client1 and client2. Client1 has poor internet connection (or requests a very big file) and client2 has the opposite conditions. Client1 makes request. Then client2 makes request. So if there was just one worker client2 would had to wait until client1 finished the whole round-trip 'request-response`. But, because there are more than one workers client2 gets response immediately not waiting the client1 (keep in mind IO does not block your single thread). The example is optimized for situation where bottleneck is IO but not the actual work. In your modified example you have quite the opposite situation - the work (30s) is very expensive compared to IO. For that case better use the second example.

Is it expected for poll() to take 40ms to return even though data will be available sooner?

I created a proxy server to handle CQL orders from website clients. The proxy listens for incoming connections and each connection is given a thread. The thread loops as long as the socket exists and dies on HUP. You may also stop the proxy, which will stop the threads by sending an event (See eventfd()) to each thread.
By itself, this already allows me to save a good 100ms because the proxy is local and connecting to a local service is much faster than a service on a remote computer... (even if the computer is local.)
However, I send orders and once in a while the proxy sees no incoming data (i.e. it calls read() on the socket which is setup as NONBLOCK and gets -1 in return and errno == EAGAIN.) When that happens, I call poll() to wait for additional data, the HUP, or a hit on the eventfd meaning I have to quit (i.e. 2 fds, the socket and the eventfd).
Somehow, more often than not, when I hit the poll() function call, it adds an extra 40ms to the time it takes for a message to go round trip. Although one would think this only happens on larger messages, it happens when I receive an order, which is less than 100 bytes! So the size should not be the culprit. I also changed the code to make sure I send the entire order from the client to the proxy in one write() and to avoid the poll() if at all possible (i.e. I call read() first, and poll() only if nothing is available.)
Note that I have no timeout in this case because there is nothing to check other than the incoming orders and the eventfd. So I would imagine that the timeout won't be a problem.
The code base is really big. But the client/server comes down to something like this (the sizes in original are fully dynamic):
// Client
...
connect(socket);
...
write(socket, order, sizeof(order));
read(socket, result, sizeof(result));
// repeat for other orders, as required by client...
// server
...
socket = accept(); // happens for each client
...
pthread_create(runner);
...
// server thread (runner)
...
for(;;)
{
int r(0);
for(;;)
{
r += read(socket, order, sizeof(order));
if(r >= sizeof(order))
{
break;
}
// wait for more data is not enough received yet
poll(..."socket" + "eventfd"...); // <-- this will often take 40ms
if(eventfd_happened)
{
// quit thread
return;
}
}
...
[work on order]
...
write(socket, result, sizeof(result));
}
Note 1: I see the problem when I have a single client. So having multiple clients does not in itself cause the problem either.
Note 2: The client really uses BIO_connect(), BIO_read() and BIO_write() [from OpenSSL], but I doubt that would be a problem. I do not use any kind of encryption.
I don't see why you're using non-blocking I/O given you have a dedicated thread per socket. Just block in read(). Use SO_RCVTIMEO if you need an overall read timeout.

how to simulate time delay in network

Let's say that we need to send this message Hellow World using UDP protocol between two PCs A and B . Computer A will send the message to B with some time delay (i.e. constant or time-varying). Now to simulate this scenario, my first attempt is to use sleep function but this solution will freezes the entire application. Another solution is to implement mutlithreads and use sleep() with the thread that is responsible for getting the data and store this in a global variable and access this variable from another thread. In this solution, there might be difficulties in the synchronization between the threads. To overcome this problem, I will write the received data in txt file and read it from another thread. My question is what is the proper way to carry out this trivial experiment? I will appreciate if the answer has some C++ pseudo.
Edit:
My attempt to solve it is as follows, for the Master side (client),
Master masterObj
int main()
{
masterObj.initialize();
masterObj.connect();
while( masterObj.isConnected() == true ){
get currentTime and data; // currentTime here is sendTime
datagram = currentTime + data;
masterObj.send( datagram );
}
}
For the Slave side (server), the pseudo code is
Slave slaveObj
int main()
{
slaveObj.initialize();
slaveObj.connect();
slaveObj.slaveThreadInit();
while( slaveObj.isConnected() == true ){
slaveObj.getData();
}
}
Slave::recieve()
{
get currentTime and call it recievedTime
get datagram from Master;
this->slaveThread( recievedTime + datagram );
}
Slave::slaveThread( info )
{
sleep( 1 msec );
info = recievedTime + datagram ;
get time delay;
time delay = sendTime - recievedTime;
extract data from datagram;
insert data and time delay in txt file ( call it txtSlaveData);
}
Slave::getData()
{
read from txtSlaveData;
}
As you can see, I'm using an independent thread which inside it, I'm using sleep(). I'm not sure if this approach is applicable.
A simple way to simulate sending UDP datagrams from one computer to another is to send the datagrams through the loopback interface to another - or the same - process on the same computer. That will function exactly like the real thing except for the delay.
You can simulate the delay either when sending or receiving. Once you've implemented it one way, the other should be trivial. I think delaying the sending side is more natural option. Here is an approach for the more general problem of simulating network delay. See the last paragraph for a trivial experiment of sending only one datagram.
In case you choose delaying on send, what you could do is, instead of sending, store the datagram in a queue, along with the time it should be sent (target = now + delay).
Then, in another thread, wait for a datagram to become available, then sleep for max(target - now, 0). After sleeping, send the datagram and move on to the next one. Wait if queue is empty.
To simulate jitter, randomize the delay. To let jitter simulation send the datagrams in non-sequential order, use a priority queue, sorted by the target send-time.
Remember to synchronize the access to the queue.
For a single datagram, you can do much simpler. Simply start a new thread, sleep for the delay, send and end thread. No need for synchronization. Here's c++ code for that:
std::thread([]{
std::this_thread::sleep_for(delay);
send("foo");
}).detach();

Multi-threaded Server handling multiple clients in one thread

I wanted to create a multi-threaded socket server using C++11 and standard linux C-Librarys.
The easiest way doing this would be opening a new thread for each incoming connection, but there must be an other way, because Apache isn't doing this. As far as I know Apache handles more than one connection in a Thread. How to realise such a system?
I thought of creating one thread always listening for new clients and assigning this new client to a thread. But if all threads are excecuting an "select()" currently, having an infinite timeout and none of the already assigned client is doing anything, this could take a while for the client to be useable.
So the "select()" needs a timeout. Setting the timeout to 0.5ms would be nice, but I guess the workload could rise too much, couldn't it?
Can someone of you tell me how you would realise such a system, handling more than one client for each thread?
PS: Hope my English is well enough for you to understand what I mean ;)
The standard method to multiplex multiple requests onto a single thread is to use the Reactor pattern. A central object (typically called a SelectServer, SocketServer, or IOService), monitors all the sockets from running requests and issues callbacks when the sockets are ready to continue reading or writing.
As others have stated, rolling your own is probably a bad idea. Handling timeouts, errors, and cross platform compatibility (e.g. epoll for linux, kqueue for bsd, iocp for windows) is tricky. Use boost::asio or libevent for production systems.
Here is a skeleton SelectServer (compiles but not tested) to give you an idea:
#include <sys/select.h>
#include <functional>
#include <map>
class SelectServer {
public:
enum ReadyType {
READABLE = 0,
WRITABLE = 1
};
void CallWhenReady(ReadyType type, int fd, std::function<void()> closure) {
SocketHolder holder;
holder.fd = fd;
holder.type = type;
holder.closure = closure;
socket_map_[fd] = holder;
}
void Run() {
fd_set read_fds;
fd_set write_fds;
while (1) {
if (socket_map_.empty()) break;
int max_fd = -1;
FD_ZERO(&read_fds);
FD_ZERO(&write_fds);
for (const auto& pr : socket_map_) {
if (pr.second.type == READABLE) {
FD_SET(pr.second.fd, &read_fds);
} else {
FD_SET(pr.second.fd, &write_fds);
}
if (pr.second.fd > max_fd) max_fd = pr.second.fd;
}
int ret_val = select(max_fd + 1, &read_fds, &write_fds, 0, 0);
if (ret_val <= 0) {
// TODO: Handle error.
break;
} else {
for (auto it = socket_map_.begin(); it != socket_map_.end(); ) {
if (FD_ISSET(it->first, &read_fds) ||
FD_ISSET(it->first, &write_fds)) {
it->second.closure();
socket_map_.erase(it++);
} else {
++it;
}
}
}
}
}
private:
struct SocketHolder {
int fd;
ReadyType type;
std::function<void()> closure;
};
std::map<int, SocketHolder> socket_map_;
};
First off, have a look at using poll() instead of select(): it works better when you have large number of file descriptors used from different threads.
To get threads currently waiting in I/O out of waiting I'm aware of two methods:
You can send a suitable signal to the thread using pthread_kill(). The call to poll() fails and errno is set to EINTR.
Some systems allow a file descriptor to be obtained from a thread control device. poll()ing the corresponding file descriptor for input succeeds when the thread control device is signalled. See, e.g., Can we obtain a file descriptor for a semaphore or condition variable?.
This is not a trivial task.
In order to achieve that, you need to maintain a list of all opened sockets (the server socket and the sockets to current clients). You then use the select() function to which you can give a list of sockets (file descriptors). With correct parameters, select() will wait until any event happen on one of the sockets.
You then must find the socket(s) which caused select() to exit and process the event(s). For the server socket, it can be a new client. For client sockets, it can be requests, termination notification, etc.
Regarding what you say in your question, I think you are not understanding the select() API very well. It is OK to have concurrent select() calls in different threads, as long as they are not waiting on the same sockets. Then if the clients are not doing anything, it doesn't prevent the server select() from working and accepting new clients.
You only need to give select() a timeout if you want to be able to do things even if clients are not doing anything. For example, you may have a timer to send periodic infos to the clients. You then give select a timeout corresponding to you first timer to expire, and process the expired timer when select() returns (along with any other concurrent events).
I suggest you have a long read of the select manpage.