I have create an Async C++ gRPC server that offer several APIs similar with a signature similar to this:
service Foo {
rpc FunctionalityA(ARequest) returns (stream AResponse);
rpc FunctionalityB(BRequest) returns (stream BResponse);
}
The client creates one channel to connect to this service, and uses calls the various RPCs from separate threads, something like this:
class FooClient {
// ...
void FunctionalityA() {
auto stub = example::Foo::NewStub(m_channel);
grpc::ClientContext context;
example::ARequest request;
example::AResponse response;
auto reader = stub->FunctionalityA(&context, request);
for(int i = 0; i < 3; i++) {
reader->Read(&response);
}
}
void FunctionalityB() {
auto stub = example::Foo::NewStub(m_channel);
grpc::ClientContext context;
example::BRequest request;
example::BResponse response;
auto reader = stub->FunctionalityB(&context, request);
for(int i = 0; i < 3; i++) {
reader->Read(&response);
}
}
// ...
};
int main() {
// ...
FooClient client(grpc::CreateChannel("127.0.0.1:12345", grpc::InsecureChannelCredentials()));
auto ta = std::thread(&FooClient::FunctionalityA, &client);
auto tb = std::thread(&FooClient::FunctionalityB, &client);
// ...
}
I want to implement the server so that:
when FunctionalityA is called, it start streaming objects of type AResponse
when FunctionalityB is called, it start streaming objects of type BResponse
when the context used to call FunctionalityA is cancelled, streaming of AResponse ends
when the context used to call FunctionalityB is cancelled, streaming of BResponse ends
The problem I face is that even when the ClientContext associated with one of the two Functionalities goes out of scope (after the 3 reads in the example) the server does not receive any information and keeps writing, and the "ok" status remains true.
The "ok" status goes to false and allows me to stop Writing only when the client disconnects.
Is this the intended behavior of gRPC? Does the client need to send a specific "kiss of death" message in order to inform the server to stop writing on the stream?
Here is an example of the implementation of a Functionality server side, for completeness:
void FunctionalityB::ProcessRequest(bool ok, RequestState state) {
if(!ok) {
if(state == RequestState::START) {
// the server has been Shutdown before this particular call got matched to an incoming RPC
delete this;
} else if(state == RequestState::WRITE || state == RequestState::FINISH) {
// not going to the wire because the call is already dead (i.e., canceled, deadline expired, other side dropped the channel, etc).
delete this;
} else {
// unhandled state
}
} else {
if(state == RequestState::START) {
// the RPC has indeed been started
m_writer.Write(m_response, CreateTag(RequestState::WRITE));
// the constructor of the functionality requests a new one to handle future new connections
new FunctionalityB(m_completion_queue, m_service, m_worker);
} else if(state == RequestState::WRITE) {
// TODO do some real work
std::this_thread::sleep_for(std::chrono::milliseconds(50));
m_writer.Write(m_response, CreateTag(RequestState::WRITE)); // this write will continue forever, even after client stops reading and TryCancel its context
} else if(state == RequestState::FINISH) {
delete this;
} else {
// unhandled state
}
}
}
There are two ways to detect call cancellation on the server.
The first one is to check ServerContext::IsCancelled(). That is something you can check right before you do a write, which in this case may be fine. In the general case, though, it may not be ideal, because your application might be waiting for some other event (other than the previous write completing) before it does another write, and you ideally want some async way of getting notified when the cancellation happens.
Which brings me to the second approach, which is to request an event on the completion queue when the call is cancelled by calling ServerContext::AsyncNotifyWhenDone() before the RPC starts. This will give you async notification of the cancellation, but unfortunately, the API is very cumbersome and has a few sharp edges. (This is something that is handled much more cleanly in the new callback-based API, but that API isn't that performant in OSS until we finish the EventEngine effort.)
I hope this info is helpful.
Related
I am using a very simple proto where the Message contains only 1 string field. Like so:
service LongLivedConnection {
// Starts a grpc connection
rpc Connect(Connection) returns (stream Message) {}
}
message Connection{
string userId = 1;
}
message Message{
string serverMessage = 1;
}
The use case is that the client should connect to the server, and the server will use this grpc for push messages.
Now, for the client code, assuming that I am already in a worker thread, how do I properly set it up so that I can continuously receive messages that come from server at random times?
void StartConnection(const std::string& user) {
Connection request;
request.set_userId(user);
Message message;
ClientContext context;
stub_->Connect(&context, request, &reply);
// What should I do from now on?
// notify(serverMessage);
}
void notify(std::string message) {
// generate message events and pass to main event loop
}
I figured out how to used the api. Looks like it is pretty flexible, but still a little bit weird given that I typically just expect the async api to receive some kind of lambda callback.
The code below is blocking, you'll have to run this in a different thread so it doesn't block your application.
I believe you can have multiple thread accessing the CompletionQueue, but in my case I just had one single thread handling this grpc connection.
GrpcConnection.h file:
public:
void StartGrpcConnection();
private:
std::shared_ptr<grpc::Channel> m_channel;
std::unique_ptr<grpc::ClientReader<push_notifications::Message>> m_reader;
std::unique_ptr<push_notifications::PushNotificationService::Stub> m_stub;
GrpcConnection.cpp files:
...
void GrpcConnectionService::StartGrpcConnection()
{
m_channel = grpc::CreateChannel("localhost:50051",grpc::InsecureChannelCredentials());
LongLiveConnection::Connect request;
request.set_user_id(12345);
m_stub = LongLiveConnection::LongLiveConnectionService::NewStub(m_channel);
grpc::ClientContext context;
grpc::CompletionQueue cq;
std::unique_ptr<grpc::ClientAsyncReader<LongLiveConnection::Message>> reader =
m_stub->PrepareAsyncConnect(&context, request, &cq);
void* got_tag;
bool ok = false;
LongLiveConnection::Message reply;
reader->StartCall((void*)1);
cq.Next(&got_tag, &ok);
if (ok && got_tag == (void*)1)
{
// startCall() is successful if ok is true, and got_tag is void*1
// start the first read message with a different hardcoded tag
reader->Read(&reply, (void*)2);
while (true)
{
ok = false;
cq.Next(&got_tag, &ok);
if (got_tag == (void*)2)
{
// this is the message from server
std::string body = reply.server_message();
// do whatever you want with body, in my case i push it to my applications' event stream to be processed by other components
// lastly, initialize another read
reader->Read(&reply, (void*)2);
}
else if (got_tag == (void*)3)
{
// if you do something else, such as listening to GRPC channel state change, in your call, you can pass a different hardcoded tag, then, in here, you will be notified when the result is received from that call.
}
}
}
}
We've implemented a Java gRPC service that runs in the cloud, with an unidirectional (client to server) streaming RPC which looks like:
rpc PushUpdates(stream Update) returns (Ack);
A C++ client (a mobile device) calls this rpc as soon as it boots up, to continuously send an update every 30 or so seconds, perpetually as long as the device is up and running.
ChannelArguments chan_args;
// this will be secure channel eventually
auto channel_p = CreateCustomChannel(remote_addr, InsecureChannelCredentials(), chan_args);
auto stub_p = DialTcc::NewStub(channel_p);
// ...
Ack ack;
auto strm_ctxt_p = make_unique<ClientContext>();
auto strm_p = stub_p->PushUpdates(strm_ctxt_p.get(), &ack);
// ...
While(true) {
// wait until we are ready to send a new update
Update updt;
// populate updt;
if(!strm_p->Write(updt)) {
// stream is not kosher, create a new one and restart
break;
}
}
Now different kinds of network interruptions happen while this is happening:
the gRPC service running in the cloud may go down (for maintenance) or may simply become unreachable.
the device's own ip address keeps changing as it is a mobile device.
We've seen that on such events, neither the channel, nor the Write() API is able to detect network disconnection reliably. At times the client keep calling Write() (which doesn't return false) but the server doesn't receive any data (wireshark doesn't show any activity at the outgoing port of the client device).
What are the best practices to recover in such cases, so that the server starts receiving the updates within X seconds from the time when such an event occurs? It is understandable that there would loss of X seconds worth data whenever such an event happens, but we want to recover reliably within X seconds.
gRPC version: 1.30.2, Client: C++-14/Linux, Sever: Java/Linux
Here's how we've hacked this. I want to check if this can be made any better or anyone from gRPC can guide me about a better solution.
The protobuf for our service looks like this. It has an RPC for pinging the service, which is used frequently to test connectivity.
// Message used in IsAlive RPC
message Empty {}
// Acknowledgement sent by the service for updates received
message UpdateAck {}
// Messages streamed to the service by the client
message Update {
...
...
}
service GrpcService {
// for checking if we're able to connect
rpc Ping(Empty) returns (Empty);
// streaming RPC for pushing updates by client
rpc PushUpdate(stream Update) returns (UpdateAck);
}
Here is how the c++ client looks, which does the following:
Connect():
Create the stub for calling the RPCs, if the stub is nullptr.
Call Ping() in regular intervals until it is successful.
On success call PushUpdate(...) RPC to create a new stream.
On failure reset the stream to nullptr.
Stream(): Do the following a while(true) loop:
Get the update to be pushed.
Call Write(...) on the stream with the update to be pushed.
If Write(...) fails for any reason break and the control goes back to Connect().
Once in every 30 minutes (or some regular interval), reset everything (stub, channel, stream) to nullptr to start afresh. This is required because at times Write(...) does not fail even if there is no connection between the client and the service. Write(...) calls are successful but the outgoing port on the client does not show any activity on wireshark!
Here is the code:
constexpr GRPC_TIMEOUT_S = 10;
constexpr RESTART_INTERVAL_M = 15;
constexpr GRPC_KEEPALIVE_TIME_MS = 10000;
string root_ca, tls_key, tls_cert; // for SSL
string remote_addr = "https://remote.com:5445";
...
...
void ResetStreaming() {
if (stub_p) {
if (strm_p) { // graceful restart/stop, this pair of API are called together, in this order
if (!strm_p->WritesDone()) {
// Log a message
}
strm_p->Finish(); // Log if return value of this is NOT grpc::OK
}
strm_p = nullptr;
strm_ctxt_p = nullptr;
stub_p = nullptr;
channel_p = nullptr;
}
}
void CreateStub() {
if (!stub_p) {
ChannelArguments chan_args;
chan_args.SetInt(GRPC_ARG_KEEPALIVE_TIME_MS, GRPC_KEEPALIVE_TIME_MS);
channel_p = CreateCustomChannel(
remote_addr,
SslCredentials(SslCredentialsOptions{root_ca, tls_key, tls_cert}),
chan_args);
stub_p = GrpcService::NewStub(m_channel_p);
}
}
void Stream() {
const auto restart_time = steady_clock::now() + minutes(RESTART_INTERVAL_M);
while (!stop) {
// restart every RESTART_INTERVAL_M (15m) even if ALL IS WELL!!
if (steady_clock::now() > restart_time) {
break;
}
Update updt = GetUpdate(); // get the update to be sent
if (!stop) {
if (channel_p->GetState(true) == GRPC_CHANNEL_SHUTDOWN ||
!strm_p->Write(updt)) {
// could not write!!
return; // we will Connect() again
}
}
}
// stopped due to stop = true or interval to create new stream has expired
ResetStreaming(); // channel, stub, stream are recreated once in every 15m
}
bool PingRemote() {
ClientContext ctxt;
ctxt.set_deadline(system_clock::now() + seconds(GRPC_TIMEOUT_S));
Empty req, resp;
CreateStub();
if (stub_p->Ping(&ctxt, req, &resp).ok()) {
static UpdateAck ack;
strm_ctxt_p = make_unique<ClientContext>(); // need new context
strm_p = stub_p->PushUpdate(strm_ctxt_p.get(), &ack);
return true;
}
if (strm_p) {
strm_p = nullptr;
strm_ctxt_p = nullptr;
}
return false;
}
void Connect() {
while (!stop) {
if (PingRemote() || stop) {
break;
}
sleep_for(seconds(5)); // wait before retrying
}
}
// set to true from another thread when we want to stop
atomic<bool> stop = false;
void StreamUntilStopped() {
if (stop) {
return;
}
strm_thread_p = make_unique<thread>([&] {
while (!stop) {
Connect();
Stream();
}
});
}
// called by the thread that sets stop = true
void Finish() {
strm_thread_p->join();
}
With this we are seeing that the streaming recovers within 15 minutes (or RESTART_INTERVAL_M) whenever there is a disruption for any reason. This code runs in a fast path, so I am curious to know if this can be made any better.
I am now implementing the Raft algorithm, and I want to use gRPC stream to do this. My main idea is to create 3 streams for each node to every other peers, one stream will transmit one type of RPCs, there are AppendEntries, RequestVote and InstallSnapshot. I write some code with limited help from route_guide, because in its bidirectional stream demo RouteChat, the client send all its data before it starts to read.
Firstly, I want to write to a stream at any time, so I write the following codes
void RaftMessagesStreamClientSync::AsyncRequestVote(const RequestVoteRequest& request){
std::string peer_name = this->peer_name;
debug("GRPC: Send RequestVoteRequest from %s to %s\n", request.name().c_str(), peer_name.c_str());
request_vote_stream->Write(request);
}
Meanwhile, I want a thread keep reading from a stream, like the following codes, which is called immediately after RaftMessagesStreamClientSync is constructed.
void RaftMessagesStreamClientSync::handle_response(){
// strongThis is a must
auto strongThis = shared_from_this();
t1 = new std::thread([strongThis](){
RequestVoteResponse response;
while (strongThis->request_vote_stream->Read(&response)) {
debug("GRPC: Recv RequestVoteResponse from %s, me %s\n", response.name().c_str(), strongThis->raft_node->name.c_str());
...
}
});
...
In order to initialize 3 streams, I have to write the constructor like this, I use 3 ClientContext here because the document says one ClientContext for one RPC
struct RaftMessagesStreamClientSync : std::enable_shared_from_this<RaftMessagesStreamClientSync>{
typedef grpc::ClientReaderWriter<RequestVoteRequest, RequestVoteResponse> CR;
typedef grpc::ClientReaderWriter<AppendEntriesRequest, AppendEntriesResponse> CA;
typedef grpc::ClientReaderWriter<InstallSnapshotRequest, InstallSnapshotResponse> CI;
std::unique_ptr<CR> request_vote_stream;
std::unique_ptr<CA> append_entries_stream;
std::unique_ptr<CI> install_snapshot_stream;
ClientContext context_r;
ClientContext context_a;
ClientContext context_i;
std::thread * t1 = nullptr;
std::thread * t2 = nullptr;
std::thread * t3 = nullptr;
...
}
RaftMessagesStreamClientSync::RaftMessagesStreamClientSync(const char * addr, struct RaftNode * _raft_node) : raft_node(_raft_node), peer_name(addr) {
std::shared_ptr<Channel> channel = grpc::CreateChannel(addr, grpc::InsecureChannelCredentials());
stub = raft_messages::RaftStreamMessages::NewStub(channel);
// 1
request_vote_stream = stub->RequestVote(&context_r);
// 2
append_entries_stream = stub->AppendEntries(&context_a);
// 3
install_snapshot_stream = stub->InstallSnapshot(&context_i);
}
~RaftMessagesStreamClientSync() {
raft_node = nullptr;
t1->join();
t2->join();
t3->join();
delete t1;
delete t2;
delete t3;
}
Then I implement the server side
Status RaftMessagesStreamServiceImpl::RequestVote(ServerContext* context, ::grpc::ServerReaderWriter< ::raft_messages::RequestVoteResponse, RequestVoteRequest>* stream){
RequestVoteResponse response;
RequestVoteRequest request;
while (stream->Read(&request)) {
...
}
return Status::OK;
}
Then 2 problems happen:
When I test with 3 nodes, which actually creates 2 RaftMessagesStreamServiceImpl for each node, the statement from 1 to 3 cost a long time to execute.
There is no RPC received from server side.
There are similar problems when using Bidi Aysnc Server, However I can't figure out how this post can help me.
UPDATE
After some debugging, I found request_vote_stream->Write(request) returns 0, which, according to the document, means the stream is closed. However why is it closed?
After some debugging, I found that the two problem are all due to one problem that I create a client before I create a server.
Because I originally uses unary RPC calls, so a previous call from client only causes a gRPC error code 14. The program continues because every call sent after the server is created can be handled correctly.
However, when it comes to streaming calls, stub->RequestVote(&context_r) will end up calling a blocking function ClientReaderWriter::ClientReaderWriter, which will try to connect to the server, which is not created now.
/// Block to create a stream and write the initial metadata and \a request
/// out. Note that \a context will be used to fill in custom initial metadata
/// used to send to the server when starting the call.
ClientReaderWriter(::grpc::ChannelInterface* channel,
const ::grpc::internal::RpcMethod& method,
ClientContext* context)
: context_(context),
cq_(grpc_completion_queue_attributes{
GRPC_CQ_CURRENT_VERSION, GRPC_CQ_PLUCK,
GRPC_CQ_DEFAULT_POLLING}), // Pluckable cq
call_(channel->CreateCall(method, context, &cq_)) {
if (!context_->initial_metadata_corked_) {
::grpc::internal::CallOpSet<::grpc::internal::CallOpSendInitialMetadata>
ops;
ops.SendInitialMetadata(context->send_initial_metadata_,
context->initial_metadata_flags());
call_.PerformOps(&ops);
cq_.Pluck(&ops);
}
}
As a consequence, the connection has not yet been established.
I'm trying to create a multithread server application using mongoose web server library.
I have main thread serving connections and sending requests to processors that are working in their own threads. Then processors place results into queue and queue observer must send results back to clients.
Sources are looking that way:
Here I prepare the data for processors and place it to queue.
typedef std::pair<struct mg_connection*, const char*> TransferData;
int server_app::event_handler(struct mg_connection *conn, enum mg_event ev)
{
Request req;
if (ev == MG_AUTH)
return MG_TRUE; // Authorize all requests
else if (ev == MG_REQUEST)
{
req = parse_request(conn);
task_queue->push(TransferData(conn,req.second));
mg_printf(conn, "%s", ""); // (1)
return MG_MORE; // (2)
}
else
return MG_FALSE; // Rest of the events are not processed
}
And here I'm trying to send the result back. This function is working in it's own thread.
void server_app::check_results()
{
while(true)
{
TransferData res;
if(!res_queue->pop(res))
{
boost::this_thread::sleep_for(boost::chrono::milliseconds(100));
continue;
}
mg_printf_data(res.first, "%s", res.second); // (3)
}
}
The problem is a client doesn't receive anything from the server.
If I run check_result function manualy in the event_handler after placing a task into the queue and then pass computed result back to event_handler, I'm able to send it to client using mg_printf_data (with returning MG_TRUE). Any other way - I'm not.
What exactly should I change in this sources to make it works?
Ok... It looks like I've solved it myself.
I'd been looking into mongoose.c code and an hour later I found the piece of code below:
static void write_terminating_chunk(struct connection *conn) {
mg_write(&conn->mg_conn, "0\r\n\r\n", 5);
}
static int call_request_handler(struct connection *conn) {
int result;
conn->mg_conn.content = conn->ns_conn->recv_iobuf.buf;
if ((result = call_user(conn, MG_REQUEST)) == MG_TRUE) {
if (conn->ns_conn->flags & MG_HEADERS_SENT) {
write_terminating_chunk(conn);
}
close_local_endpoint(conn);
}
return result;
}
So I've tried to do mg_write(&conn->mg_conn, "0\r\n\r\n", 5); after line (3) and now it's working.
I get lots of ERROR_INTERNET_CANNOT_CONNECT (12029 code) in callback procedure of the request.
I use WinHttp in async mode(on a server). How do you cleanly close the connection in this case. Do you just use something like this(like you normally close a connection?):
::WinHttpSetStatusCallback(handle, NULL, 0, 0);
::WinHttpCloseHandle(this->handle));
I ask this because I have some strange memory leaking associated with winhttp dll that occurs in the situation described(want to create hundreds of concurrent connections that are probably blocked by the firm internal firewall or destination server drops the connections). I have already looked at the documentation of WinHttpCloseHandle on msdn...
Here is how I do handling of the callback states:
template <typename T>
void WinHttp::AsyncRequest<T>::OnCallback(DWORD code, const void* info, DWORD length)
{
T* pT = static_cast<T*>(this);
switch (code)
{
case WINHTTP_CALLBACK_STATUS_SENDREQUEST_COMPLETE:
case WINHTTP_CALLBACK_STATUS_WRITE_COMPLETE:
{
HRESULT result = pT->OnWriteData();
if (FAILED(result))
{
throw CommunicationException(::GetLastError());
}
if (S_FALSE == result)
{
if (!::WinHttpReceiveResponse(handle, 0)) // reserved
{
throw CommunicationException(::GetLastError());
}
}
break;
}
case WINHTTP_CALLBACK_STATUS_HEADERS_AVAILABLE:
{
DWORD statusCode;
DWORD statusCodeSize = sizeof(DWORD);
if (!::WinHttpQueryHeaders(handle, WINHTTP_QUERY_STATUS_CODE | WINHTTP_QUERY_FLAG_NUMBER, WINHTTP_HEADER_NAME_BY_INDEX, &statusCode, &statusCodeSize, WINHTTP_NO_HEADER_INDEX))
{
throw CommunicationException(::GetLastError());
}
pT->OnStatusCodeReceived(statusCode);
if (!::WinHttpQueryDataAvailable(handle, 0))
{
// If a synchronous error occured, throw error. Otherwise
// the query is successful or asynchronous.
throw CommunicationException(::GetLastError());
}
break;
}
case WINHTTP_CALLBACK_STATUS_DATA_AVAILABLE:
{
unsigned int size = *((LPDWORD) info);
if (size == 0)
{
pT->OnResponseComplete(S_OK);
}
else
{
unsigned int sizeToRead = (size <= chunkSize) ? size : chunkSize;
if (!::WinHttpReadData(handle, &buffer[0], sizeToRead, 0)) // async result
{
throw CommunicationException(::GetLastError());
}
}
break;
}
case WINHTTP_CALLBACK_STATUS_READ_COMPLETE:
{
if (length != 0)
{
pT->OnReadComplete(&buffer[0], length);
if (!::WinHttpQueryDataAvailable(handle, 0))
{
// If a synchronous error occured, throw error. Otherwise
// the query is successful or asynchronous.
throw CommunicationException(::GetLastError());
}
}
else
{
pT->OnResponseComplete(S_OK);
}
break;
}
case WINHTTP_CALLBACK_STATUS_REQUEST_ERROR:
{
{
throw CommunicationException(::GetLastError());
}
break;
}
}
}
Here buffer is a vector that has reserved 8K once the request is initiated. Thanks in advance.
In OnResponseComplete, OnResponsEerror I eventually call also:
::WinHttpSetStatusCallback(handle, NULL, 0, 0);
assert(::WinHttpCloseHandle(this->handle));
this->handle = nullptr;
Roman R was right about the issue just want to include more details in the response.
Cancellation is tricky. For WinHTTP you need to remember that callbacks fire on a thread pool. They can arrive on a random thread and since the thread pool does not guarantee when the callback is executed, you could call WinHttpSetStatusCallback to clear the callback and then later on still receive callbacks. You thus need to synchronize this yourself. A more subtle problem is that you cannot call back into WinHTTP from a callback. The safest and most reliable way to handle WinHTTP callbacks is to dispatch the callbacks to a single thread. This could be a UI thread or a single-threaded thread pool. You then also need to close the handle first and then wait for the callback to indicate that the handle is closed (WINHTTP_CALLBACK_STATUS_HANDLE_CLOSING) – only then is it safe to release any connection-specific resources, callbacks, etc.
My tests so far showed that closing the request handle must be done outside the callbacks and also the final closing of the connection handle. One can queue up the intent to close the request and afterwards the connection closing and other cleanup on a separate thread using QueueUserApc so that at least those 2 operations are synchronized.
EDIT
The problem still remains. What I wrote below did not fix the issue.
The problem described was actually not very much related to ERROR_CANNOT_CONNECT but to the way I was "incorectly" handling the async status callback notifications of winhttp. If anyone would be interested I will copy here a typical way of handling the status notifications.