I am working on a service based on gRPC, which requires high throughput. But currently my program suffers low throughput when using C++ synchronous gRPC.
I've read through gRPC documentations, but don't find explicit explanation on the difference between sync/async APIs. Except async has control over completion queue, while it's transparent to sync APIs.
I want to know whether synchronous gRPC sends messages to TCP layer, and wait for its "ack", thus the next message would be blocked?
Meanwhile async APIs would send them asynchronously without latter messages waiting?
TLDR: Yes, async APIs would send the messages asynchronously without latter messages waiting, while synchronous APIs will block the whole thread while one message is being sent/received.
gRPC uses CompletionQueue for it's asynchronous operations. You can find the official tutorial here: https://grpc.io/docs/languages/cpp/async/
CompletionQueue is an event queue. "event" here can be the completion of a request data reception or the expiry of an alarm(timer), etc. (basically, the completion of any asynchronous operation.)
Using the official gRPC asynchronous APIs example as example, focus on the CallData class and HandleRpcs():
void HandleRpcs() {
// Spawn a new CallData instance to serve new clients.
new CallData(&service_, cq_.get());
void* tag; // uniquely identifies a request.
bool ok;
while (true) {
// Block waiting to read the next event from the completion queue. The
// event is uniquely identified by its tag, which in this case is the
// memory address of a CallData instance.
// The return value of Next should always be checked. This return value
// tells us whether there is any kind of event or cq_ is shutting down.
GPR_ASSERT(cq_->Next(&tag, &ok));
GPR_ASSERT(ok);
static_cast<CallData*>(tag)->Proceed();
}
}
HandleRpcs() is the main loop of the server. It's an infinite loop which continuously gets the next event from the completion queue by using cq->Next() , and calls it's Proceed() method (our custom method to process client request of different states).
The CallData class (instance of which represents a complete processing cycle of a client request):
class CallData {
public:
// Take in the "service" instance (in this case representing an asynchronous
// server) and the completion queue "cq" used for asynchronous communication
// with the gRPC runtime.
CallData(Greeter::AsyncService* service, ServerCompletionQueue* cq)
: service_(service), cq_(cq), responder_(&ctx_), status_(CREATE) {
// Invoke the serving logic right away.
Proceed();
}
void Proceed() {
if (status_ == CREATE) {
// Make this instance progress to the PROCESS state.
status_ = PROCESS;
// As part of the initial CREATE state, we *request* that the system
// start processing SayHello requests. In this request, "this" acts are
// the tag uniquely identifying the request (so that different CallData
// instances can serve different requests concurrently), in this case
// the memory address of this CallData instance.
service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_,
this);
} else if (status_ == PROCESS) {
// Spawn a new CallData instance to serve new clients while we process
// the one for this CallData. The instance will deallocate itself as
// part of its FINISH state.
new CallData(service_, cq_);
// The actual processing.
std::string prefix("Hello ");
reply_.set_message(prefix + request_.name());
// And we are done! Let the gRPC runtime know we've finished, using the
// memory address of this instance as the uniquely identifying tag for
// the event.
status_ = FINISH;
responder_.Finish(reply_, Status::OK, this);
} else {
GPR_ASSERT(status_ == FINISH);
// Once in the FINISH state, deallocate ourselves (CallData).
delete this;
}
}
private:
// The means of communication with the gRPC runtime for an asynchronous
// server.
Greeter::AsyncService* service_;
// The producer-consumer queue where for asynchronous server notifications.
ServerCompletionQueue* cq_;
// Context for the rpc, allowing to tweak aspects of it such as the use
// of compression, authentication, as well as to send metadata back to the
// client.
ServerContext ctx_;
// What we get from the client.
HelloRequest request_;
// What we send back to the client.
HelloReply reply_;
// The means to get back to the client.
ServerAsyncResponseWriter<HelloReply> responder_;
// Let's implement a tiny state machine with the following states.
enum CallStatus { CREATE, PROCESS, FINISH };
CallStatus status_; // The current serving state.
};
As we can see, a CallData has three states: CREATE, PROCESS and FINISH.
A request routine looks like this:
At startup, preallocates one CallData for a future incoming client.
During the construction of that CallData object, service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_, this) gets called, which tells gRPC to prepare for the reception of exactly one SayHello request.
At this point we don't know where the request will come from or when it will come, we are just telling gRPC that we are ready to process when one actually arrives, and let gRPC notice us when it happens.
Arguments to RequestSayHello tells gRPC where to put the context, request body and responder of the request after receiving one, as well as which completion queue to use for the notice and what tags should be attached to the notice event (in this case, this is used as the tag).
HandleRpcs() blocks on cq->Next(). Waiting for an event to occur.
some time later....
client makes a SayHello request to the server, gRPC starts receiving and decoding that request. (IO operation)
some time later....
gRPC have finished receiving the request. It puts the request body into the request_ field of the CallData object (via the pointer supplied earlier), then creates an event (with the pointer to the CallData object as tag, as asked earlier by the last argument to RequestSayHello). gRPC then puts that event into the completion queue cq_.
The loop in HandleRpcs() received the event(the previously blocked call to cq->Next() returns now), calls CallData::Proceed() to process the request.
status_ of the CallData is PROCESS, so it does the following:
6.1. Creates a new CallData object, so that new client requests after this one can be processed.
6.2. Generates the reply for the request, tells gRPC we have finished processing and please send the reply back to the client.
6.3 gRPC starts transmission of the reply. (IO operation)
6.4 The loop in HandleRpcs() goes into the next iteration and blocks on cq->Next() again, waiting for a new event to occur.
some time later....
gRPC have finished transmission of the reply and tells us that by again putting an event into the completion queue with a pointer to CallData as the tag.
cq->Next() receives the event and returns, CallData::Proceed() deallocates the CallData object (by using delete this;). HandleRpcs() loops and blocks on cq->Next() again, waiting for a new event.
It might look like the process is largely the same as synchonous API, just with extra access to the completion queue. However, by doing it this way, at each and every some time later.... (usually is waiting for IO operation to complete or waiting for a request to occur), cq->Next() can actually receive operation completion events not only for this request, but for other requests as well.
So if a new request come in while the first request is, let's say, waiting for the transmission of reply data to finish, cq->Next() will get the event emitted by the new request, and starts the processing of the new request immediately and concurrently, instead of waiting for the first request to finish its transmission.
Synchonous API, on the other hand, will always wait for the full completion of one request (from start receiving to finish replying) before even starting the receiving of another one. This meant near 0% CPU utilization while receiving request body data and sending back reply data (IO operations). Precious CPU time that could have been used to process other requests are wasted on just waiting.
This is really bad since if a client with a bad internet connection (100ms round-trip) sent a request to the server, we will have to spend at least 200ms for every request from this client just on actively waiting for TCP transmission to finish. That would bring our server performance down to only ~5 requests per second.
Whereas if we are using asynchonous API, we just don't actively wait for anything. We tell gRPC: "please send this data to the client, but we will not wait for you to finish here. Instead, just put a little letter to the completion queue when you are done, and we'll check it later." and move on to process other requests.
Related information
You can see how a simple server is written for both synchronous APIs and asynchronous APIs
Best performance practices
The best performance practice suggested by the gRPC C++ Performance Nodes is to spawn the amount of threads equal to your CPU cores count, and use one CompletionQueue per thread.
Related
I'm using asio to run a TCP server.
For each message the server receives, one or more responses is returned.
Most of the messages are simple returns but some are commands which will run an action, which can take up to 10 minutes then a returns a message (but only one action can run at a time).
I start my session function in a new thread, passing it a tcp::socket when a connection is made:
tcp::acceptor a(io_context, tcp::endpoint(tcp::v4(), port));
for (;;) {
std::thread(session, a.accept()).detach();
}
But after that the tcp::socket is "stuck" in the session function. I can't pass the socket anywhere else (without compilation errors so far) and the session needs to be complete because it:
Receives the message using socket.read_some()
Processes the message (and trigger an action if required)
Transmits a response using asio::write()
I need to be able to interrupt step 2 if a new message is received but without sharing the Socket I don't know how.
Whichever way I look at it, the socket can only be used by one thread so I'll either be waiting for a new message or waiting for a response to be generated - both of which would block eachother.
I have created an gRPC async client written in C++ which makes both streaming and unary requests to a server, using a completion queue.
In the destructor of the client class the Shutdown method of the completion queue is called, then I thought I could call Next to drain the queue and obtain the pending tags, but instead the call to Next blocks everything.
The pending tags are needed since they are objects create with new and must be deleted to avoid leaks.
What is the correct way to drain a queue used for an async client?
It should be that 1 tag into the completion queue, 1 tag out, so all the pending ops will get their tags returned from Next (even if the RPC gets canceled).
The symptom that Next blocks is likely due to there are pending events that is not finished.
You may like to use TryCancel to terminate the call quickly
The use case is this:
An Actor is bind to spray IO - receiving and handling all inbound HTTP requests coming through a specified port.
For each inbound request the actor needs to send an outbound asynchronous http request to a different external endpoint, get back an inbound response and send a response back to originating party.
Using spray's client sendReceive returns a future. This means the actor will continue to handle the next inbound message on it's mailbox without waiting for a response of the outbound request it just sent, in the same time the response for the outbound request might arrive and execute on the Future callback, since it is not queued on the actor's mailbox it might be executed in parallel breaking the idea of an actor being executed by only one thread in a given time.
I wonder how this use case can be handled without breaking the actor thread encapsulation, how can an actor make use of spray-client (for sending/receiving asynchronous http events) in an actor safe way?
It is perfectly safe to complete with the future, not the actual value in spray-routing, so for instance, you can do the following:
get {
comlete {
val resultFuture: Future[Result] = ...
val resultFuture.onComplete {....}
resultFuture
}
}
Of course, you will need to make sure that you handle timeouts and error conditions as well.
The question is which thread executes the callback, if it is not queued on the actor's mailbox it could be a parallel execution to the actor receive handling, which might break its thread encapsulation...
To my understanding, there is the same issue with akka actor 'ask' method which returns a Future, they provide a warning not to execute operations on the actor's mutable state from within the callback since it may cause synchronization problems. see: http://doc.akka.io/docs/akka/snapshot/scala/actors.html
"Warning:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback. This would break the actor encapsulation and may introduce synchronization bugs and race conditions because the callback will be scheduled concurrently to the enclosing actor. Unfortunately there is not yet a way to detect these illegal accesses at compile time."
I have a function to give recommendations to users. This function need to make a lot of calcs to start, but after start it use the already calculed matrix on memory. After this, any other calc that is made, "fills" the object in memory to continuous learning.
My intention is to use this function to website users, but the response need to come from the same "object" in memory and need to be sequential by request because it is not thread safe.
How is the best way to get this working? My first idea was use signalr so the user dont need to wait to response and a queue to send the requests to objects. But how the signalr can receive the response for this specific request?
The entire flow is:
User enter on a page.
A javascript will call a service with the user ID and actual page.
The server will queue the ID an page.
The service will be calculating the results for each request on queue and sending responses.
The server will "receive" the response and send back to client.
The main problem is that I dont see a way to the service receive the response to send back to client until it is complete, without need to be looping in queues.
Thanks!
If you are going to use SignalR, I would suggest using a hub method to accept these potentially long running requests from the client. By doing so it should be obvious "how the signalr can receive the response for this specific request".
You should be able to queue your calculations from inside your hub method where you will have access to the caller's connection id (via the Context.ConnectionId property).
If you can await the results of your queued operation inside of the hub method you queue from, you can then simply return the result from your hub method and SignalR will flow the result back to the calling JavaScript. You can also use Clients.Caller.... to send the result back.
If you go this route I suggest you use async/await instead of blocking request threads waiting for your long-running calculations to complete.
http://www.asp.net/signalr/overview/signalr-20/hubs-api/hubs-api-guide-server
If you can't process your calculation results from the same method you queued the calculation from, you still have options. Just be sure to queue the caller's connection id and a request id along with the calculation to be processed.
Then, you can process the results of all your calculations from outside of your hub using GlobalHost.ConnectionManager.GetHubContext:
private IHubContext _context = GlobalHost.ConnectionManager.GetHubContext<MyHub>()
// Call ProcessResults whenever results are ready to send back to the client
public void ProcessResults(string connectionId, uint requestId, MyResult result)
{
// Presumably there's JS code mapping request id's to results
// if you can have multiple ongoing requests per client
_context.Clients.Client(connectionId).receiveResult(requestId, result);
}
http://www.asp.net/signalr/overview/signalr-20/hubs-api/hubs-api-guide-server#callfromoutsidehub
I am looking for a cross platform method of notifying several client applications that a service/daemon has started and is able to handle incoming connections. The clients will be running all the time, whereas the service may not. Normally the service/daemon will be started automatically when the computer starts, but in some cases it may not and in those cases the clients should automatically connect when the service/daemon starts.
The basic flow of the client is to wait until they notice that the service is running, then connect. If the connection is interrupted or if they were unable to connect they just try again from the beginning.
For Windows I have a solution where the service signals a global event object when it starts so that the clients can wait on this event. This works ok in reality, but I am pretty sure that it does not handle all potential cases (such as a crashing service or multiple instances of the service running). I don't mind if the clients "accidentally" wake up every now and then even though the service isn't running. I just want to avoid that the clients enter a busy loop trying to connect all the time, while at the same time respond pretty quickly to the service starting. I.e. just adding a sleep between connection attempts is not great.
Is there a cross platform method to detect whether the service is running and ready to accept connections?
Update: I'll add a bit more information on how the current mechanism works on Windows using approximate code from memory, so please excuse any typos:
Service:
SECURITY_ATTRIBUTES sa;
// Set up empty SECURITY_ATTRIBUTES so that everyone has access
// ...
// Create a manual reset event, initially set to nonsignaled
HANDLE event = ::CreateEvent(&sa, TRUE, FALSE, "Global\\unique_name");
// Signal the event - service is running and ready
::SetEvent(event);
// Handle connections, do work
// If the service dies for whatever reason, Windows deletes the event handle
// The event is deleted when the last open handle to it is closed
// So the event is signaled for at least as long as the service lives
Clients:
while (true) {
// Set up event the same way as the service, including empty security attributes
// ...
HANDLE event = ::CreateEvent(&sa, TRUE, FALSE, "Global\\unique_name");
// Wait for the service to start
DWORD ret = ::WaitForSingleObject(event, INFINITE);
// Close the handle to avoid keeping the event object alive
// This isn´t enough in theory, but works in real usage as the number of clients
// will always be low
::CloseHandle(event);
// Check if we woke up because the event is signaled
if (WAIT_OBJECT_0 == ret) {
// connect to service, do work
// ...
}
}
How could I achieve approximately the same on OS X and Linux?