GRPC asynchronous requests and parallel server handling - c++

I have followed the C++ GRPC asynchronous documentation and from stackoverflow I found out that actually the client that is doing 'true' async requests is client2 here.
I have that now all working nicely, however the server code is not handling the incoming requests in parallel.
What's happening is all the requests from the client are going out, but then the responses are coming back one after the other. I know this because I have modified the server code to sleep for 5 seconds during each processing of a request like so:
new CallData(service_, cq_);
//now process the request
std::cout << " processing " << request_.name() << std::endl;
sleep(5);
std::string prefix("Hello ");
reply_.set_message(prefix + request_.name() + "!");
//inform that we are finished
this->status_ = this->FINISH;
responder_.Finish(reply_, Status::OK, this);
My expectation is that I (as the user) start the client, it fires off all 100 requests and then I wait ~5 seconds in total and all the responses come back. What's actually happening is I have to wait 5 seconds per request and they are processed one after the other.
The documentation I am finding quite thin on the ground with regards to handling async requests and I hoped someone might help me understand how to update the server side to process the requests in parallel.
Thank you

Related

Why does the latency of a request-response message pair reduce when increasing rate of messages sent over TCP?

Intro
I have a setup of a client and a server communicating over a TCP connection, and I experience weird latency-behaviour which I cant understand.
Context
The client sends a request message to the server, which responds with a response message to client.
I define latency as the time from sending a request message to receipt of a response message. I can send request messages at different rates (throttling the frequency of requests), however I always have at most one outstanding request message at any time. I.e, no concurrent/overlapping request-response message pairs.
I have implemented the sending of request and response message in three ways: first is directly on TCP sockets with my own serialization method etc, second is using gRPC for the communication over RPC using HTTP2, third is using Apache Thrift (an RPC framework similar to gRPC).
gRPC is in turn implemented in 4 different client/server types, and for Thrift I have 3 different client/server types.
In all solutions, I experience a decrease in latency when increasing the sending rate of request messages (In gRPC and Thrift a request-response pair is communicated via an RPC method).
The best latency is observed when not throttling the request rate at all, but sending a new request as soon as a response is received.
Latency is measured using the std::chrono::steady_clock primitive.
I have no idea what's causing this. I make sure to warmup the TCP connection (passing TCP slow start phase) by sending 10k request messages before starting the real testing.
How I implement the throttling and measure latency (on client ofc):
double rate;
std::cout << "Enter rate (requests/second):" << std::endl;
std::cin >> rate;
auto interval = std::chrono::microseconds(1000000)/rate;
//warmup-phase is here, but not included in this code.
auto total_lat = std::chrono::microseconds(0);
auto iter_time = start_time;
int i = 0;
for(i = 0; i < 10000; i++){ // send 10k requests.
iter_time = std::chrono::steady_clock::now();
RequestType request("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
ResponseType response;
auto start = std::chrono::steady_clock::now();
sendRequest(request); //these looks different depending on gRPC/Thrift/"TCP"
receiveResponse(&response);
auto end = std::chrono::steady_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::microseconds>(end-start);
total_lat+=dur;
std::this_thread::sleep_until(iter_time+interval); //throttle the sending..
}
// mean latency: total_lat / i
I run the client/server in separate docker containers using docker-compose, and I also run them in kubernetes cluster. In both cases I experience the same behaviour. I am thinking maybe my throttling/time measureing code is doing stuff that I dont know about/understand.
The TCP sockets are set to TCP_NODELAY in all cases.
The servers are single/multithreaded nonblocking/blocking, all kinds of different variations, and the clients are some synchronous, some asynchronous etc. So a lot of variation however the same behavious across them all.
Any ideas out here to what could cause such behaviour?
Right now I think the latency issue is not in the network stack, but the rate in which you are generating and receive messages.
Your test code does not appear to have any real-time assurances, which also need to be set in the container. This means that your 'for loop' does not run at the same speed every time. The OS scheduler can stop it to run other processes (this is how processes share the CPU). This behavior can get even more complicated with containerization mechanisms.
While there are mechanisms in TCP which can cause latency variations (as mentioned by #DNT), I don't think you would be seeing them. Especially if the server and client are local. This is why I would rule out the rate of message generation and reception first before looking at the TCP stack.

zmq DEALER socket zmq_recv_msg call always timeouts

I am using zmq ROUTER and DEALER sockets in my application(C++).
One process is listening on zmq ROUTER socket for clients to connect (a Service).
Clients connect to this service using zmq DEALER socket. From the client I am doing synchronous (blocking) request to the service. To avoid the
infinite wait time for the response, I am setting RCVTIMEO on DEALER socket to let say 5 ms. After setting this timeout I observe un-expected
behaviour on the client.
Here are the details:
Case 1: No RCVTIMEO is set on DEALER (client) socket
In this case, let say client sends 1000 Request to the service. Out of these requests for around 850 requests, client receives responses within 5 ms.
For remaining 150 request it takes more than 5 ms for response to come.
Case 2: RCVTIMEO is set for 5 ms on DEALER (client) socket
In this case, for the first 150-200 request I see valid response received within RCVTIMEO period. For all remaining requests I see RCVTIMEO timeout happening, which is not expected. The requests in both the cases are same.
The expected behiour should be: for 850 requests we should receive valid response (as they are coming within RCVTIMEO). And for remaining 150
requests we should see a timeout happening.
For having the timeout feature, I tried zmq_poll() also instead of setting RCVTIMEO, but results are same. Most of the requests are getting TIMEDOUT.
I went through the zmq documentation for details, but didn't find anything.
Can someone please explain the reason for this behaviour ?

synchronous activemq webservice

I have a webservice (Restful) that send a message through ActiveMQ, and synchronously receive the response by creating a temporary listener in the same request.
The problem is, the listener wait for response of synchronous process , but never die. I need that listener receive response, and immediately stop the listener once is responded the request of webservice.
I have a great problem, because for each request of web services, a listener is created and this is active, producing overhead.
That code in the link is not production grade - simply an example how to make a "hello world" request reply.
Here is some psuedo code to deal with consuming responses blocking - and closing the consumer afterwards.
MessageConsumer responseConsumer = session.createConsumer(tempDest);
Messages response = responseConsumer.receive(waitTimeout);
// TODO handle msg
responseConsumer.close();
Temp destinations in JMS are pretty slow anyways. You can instead use JMSCorrelationID and make the replies go to a "regular queue" handled by a single consumer for all replies. That way, you need some thread handling code to hand over the message to the web service thread, but it will be non blocking and very fast.

How to increase the socket timeout on the server side using Restify?

I use restify to implement a node.js server. Basically the server runs a time-consuming process per a HTTP POST request, but somehow the socket gets closed and the client receives an error message like this:
[Error: socket hang up] code: 'ECONNRESET'
According to the error type, the socket is definitely closed on the server side.
Is there any option that I can set in the createServer method of the restify to solve this problem?
Edit:
The long running process is using Mongoose to run MongoDB process. Maybe it is also possible that the socket hangup is caused by the connection to MongoDB? How to increase the timeout for Mongoose? I found that the hang up happened in exactly 120 seconds, so it might be because of some default timeout configuration?
Thanks in advance!
You can use the standard socket on the req object, and manually call setTimeout to increase the time before node hangs up the socket. By default, node has a 2 minute timer on all sockets for inactivity, which is why you are getting hang ups at exactly 120s (this has nothing to do with restify). As an example of increasing that, set up a handler to run before your long running task like this:
server.use(function (req, res, next) {
// This will set the idle timer to 10 minutes
req.connection.setTimeout(600 * 1000);
res.connection.setTimeout(600 * 1000); //**Edited**
next();
});
This seams not to be actually implemented
https://github.com/mcavage/node-restify/issues/288

Netty file trasfer proxy suffer big connection delay under high concurrency

I am doing a project of building a file transfer proxy using netty which should efficiently handle high concurrency.
Here is my structure:
Back Server, a normal file server just like Http(File Server) example on netty.io which receive and confirm a request and send out a file either using ChunkedBuffer or zero-copy.
Proxy, with both NioServerSocketChannelFactory and NioClientSocketChannelFactory, both using cachedThreadPool, listening to clients' requests and fetch the file from Back Server back to the clients. Once a new client is accepted, the new accepted Channel(channel1) created by NioServerSocketChannelFactory and waiting for the request. Once the request is received, the Proxy will establish a new connection to Back Server using NioClientSocketChannelFactory, and the new Channel(channel2) will send request to Back Server and deliver the response to the client. Each channel1 and channel2 using its own pipeline.
More simply, the procedure is
channel1 accepted
channel1 receives the request
channel2 connected to Back Server
channel2 send request to Back Server
channel2 receive response(including file) from Back Server
channel1 send the response got from channel2 to the client
once transferring is done, channel2 close and channel1 close on flush.(each client only send one request)
Since the required file can be big(10M), the proxy stops channel2.readable when channel1 is NOT writtable, just like example Proxy Server on netty.io.
With the above structure, each client has one accepted Channel and once it send a request it also corresponds to one client Channel until the transferring is done.
Then I use ab(apache bench) to fire up thousands of requests to the proxy and evaluate the request time. Proxy, Back Server and Client are three boxes on one rack which has no other traffic loaded.
The results are weird:
File Size 10MB, when concurrency is 1, connection delay is very small, but when concurrency increases from 1 to 10, top 1% connection delay becomes incredibly high, up to
3 secs. The other 99% are very small. When concurrency increases to 20, 1% goes to 8 sec. And it even causes ab to be timeout if concurrency is higher than 100. The 90% Processing delay are usually linear with the concurrency but 1% can abnormally goes very high under a random number of concurrency(varies over multiple testing).
File Size 1K, everything is fine at lease with concurrency below 100.
Put them on a single local machine, no connection delay.
Can anyone explain this issue and tell me which part is wrong? I saw many benchmarking online, but they are pure ping-pang testing rather than this large file transferring and proxy stuff. Hope this is interested to you guys :)
Thank you!
========================================================================================
After some source coding reading today, I found one place may prevent the new sockets to be accepted. In NioServerSocketChannelSink.bind(), the boss executor will call Boss.run(), which contains a for loop for accepting the incoming sockets. In each iteration of this loop, after getting the accepted channel, AbstractNioWorker.register() will be called which suppose to add new sockets into the selector running in worker executor. However, in
register(), a mutex called startStopLock has to be checked before worker executor invoked. This startStopLock is also used in AbstractNioWorker.run() and AbstractNioWorker.executeInIoThread(), both of which check the mutex before they invoke the worker thread. In other words, startStopLock is used in 3 functions. If it is locked in AbstractNioWorker.register(), the for loop in Boss.run() will be blocked which can cause incoming accept delay. Hope this ganna help.