Cross-platform notification that a service is running - c++

I am looking for a cross platform method of notifying several client applications that a service/daemon has started and is able to handle incoming connections. The clients will be running all the time, whereas the service may not. Normally the service/daemon will be started automatically when the computer starts, but in some cases it may not and in those cases the clients should automatically connect when the service/daemon starts.
The basic flow of the client is to wait until they notice that the service is running, then connect. If the connection is interrupted or if they were unable to connect they just try again from the beginning.
For Windows I have a solution where the service signals a global event object when it starts so that the clients can wait on this event. This works ok in reality, but I am pretty sure that it does not handle all potential cases (such as a crashing service or multiple instances of the service running). I don't mind if the clients "accidentally" wake up every now and then even though the service isn't running. I just want to avoid that the clients enter a busy loop trying to connect all the time, while at the same time respond pretty quickly to the service starting. I.e. just adding a sleep between connection attempts is not great.
Is there a cross platform method to detect whether the service is running and ready to accept connections?
Update: I'll add a bit more information on how the current mechanism works on Windows using approximate code from memory, so please excuse any typos:
Service:
SECURITY_ATTRIBUTES sa;
// Set up empty SECURITY_ATTRIBUTES so that everyone has access
// ...
// Create a manual reset event, initially set to nonsignaled
HANDLE event = ::CreateEvent(&sa, TRUE, FALSE, "Global\\unique_name");
// Signal the event - service is running and ready
::SetEvent(event);
// Handle connections, do work
// If the service dies for whatever reason, Windows deletes the event handle
// The event is deleted when the last open handle to it is closed
// So the event is signaled for at least as long as the service lives
Clients:
while (true) {
// Set up event the same way as the service, including empty security attributes
// ...
HANDLE event = ::CreateEvent(&sa, TRUE, FALSE, "Global\\unique_name");
// Wait for the service to start
DWORD ret = ::WaitForSingleObject(event, INFINITE);
// Close the handle to avoid keeping the event object alive
// This isn´t enough in theory, but works in real usage as the number of clients
// will always be low
::CloseHandle(event);
// Check if we woke up because the event is signaled
if (WAIT_OBJECT_0 == ret) {
// connect to service, do work
// ...
}
}
How could I achieve approximately the same on OS X and Linux?

Related

Difference between sync and async gRPC

I am working on a service based on gRPC, which requires high throughput. But currently my program suffers low throughput when using C++ synchronous gRPC.
I've read through gRPC documentations, but don't find explicit explanation on the difference between sync/async APIs. Except async has control over completion queue, while it's transparent to sync APIs.
I want to know whether synchronous gRPC sends messages to TCP layer, and wait for its "ack", thus the next message would be blocked?
Meanwhile async APIs would send them asynchronously without latter messages waiting?
TLDR: Yes, async APIs would send the messages asynchronously without latter messages waiting, while synchronous APIs will block the whole thread while one message is being sent/received.
gRPC uses CompletionQueue for it's asynchronous operations. You can find the official tutorial here: https://grpc.io/docs/languages/cpp/async/
CompletionQueue is an event queue. "event" here can be the completion of a request data reception or the expiry of an alarm(timer), etc. (basically, the completion of any asynchronous operation.)
Using the official gRPC asynchronous APIs example as example, focus on the CallData class and HandleRpcs():
void HandleRpcs() {
// Spawn a new CallData instance to serve new clients.
new CallData(&service_, cq_.get());
void* tag; // uniquely identifies a request.
bool ok;
while (true) {
// Block waiting to read the next event from the completion queue. The
// event is uniquely identified by its tag, which in this case is the
// memory address of a CallData instance.
// The return value of Next should always be checked. This return value
// tells us whether there is any kind of event or cq_ is shutting down.
GPR_ASSERT(cq_->Next(&tag, &ok));
GPR_ASSERT(ok);
static_cast<CallData*>(tag)->Proceed();
}
}
HandleRpcs() is the main loop of the server. It's an infinite loop which continuously gets the next event from the completion queue by using cq->Next() , and calls it's Proceed() method (our custom method to process client request of different states).
The CallData class (instance of which represents a complete processing cycle of a client request):
class CallData {
public:
// Take in the "service" instance (in this case representing an asynchronous
// server) and the completion queue "cq" used for asynchronous communication
// with the gRPC runtime.
CallData(Greeter::AsyncService* service, ServerCompletionQueue* cq)
: service_(service), cq_(cq), responder_(&ctx_), status_(CREATE) {
// Invoke the serving logic right away.
Proceed();
}
void Proceed() {
if (status_ == CREATE) {
// Make this instance progress to the PROCESS state.
status_ = PROCESS;
// As part of the initial CREATE state, we *request* that the system
// start processing SayHello requests. In this request, "this" acts are
// the tag uniquely identifying the request (so that different CallData
// instances can serve different requests concurrently), in this case
// the memory address of this CallData instance.
service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_,
this);
} else if (status_ == PROCESS) {
// Spawn a new CallData instance to serve new clients while we process
// the one for this CallData. The instance will deallocate itself as
// part of its FINISH state.
new CallData(service_, cq_);
// The actual processing.
std::string prefix("Hello ");
reply_.set_message(prefix + request_.name());
// And we are done! Let the gRPC runtime know we've finished, using the
// memory address of this instance as the uniquely identifying tag for
// the event.
status_ = FINISH;
responder_.Finish(reply_, Status::OK, this);
} else {
GPR_ASSERT(status_ == FINISH);
// Once in the FINISH state, deallocate ourselves (CallData).
delete this;
}
}
private:
// The means of communication with the gRPC runtime for an asynchronous
// server.
Greeter::AsyncService* service_;
// The producer-consumer queue where for asynchronous server notifications.
ServerCompletionQueue* cq_;
// Context for the rpc, allowing to tweak aspects of it such as the use
// of compression, authentication, as well as to send metadata back to the
// client.
ServerContext ctx_;
// What we get from the client.
HelloRequest request_;
// What we send back to the client.
HelloReply reply_;
// The means to get back to the client.
ServerAsyncResponseWriter<HelloReply> responder_;
// Let's implement a tiny state machine with the following states.
enum CallStatus { CREATE, PROCESS, FINISH };
CallStatus status_; // The current serving state.
};
As we can see, a CallData has three states: CREATE, PROCESS and FINISH.
A request routine looks like this:
At startup, preallocates one CallData for a future incoming client.
During the construction of that CallData object, service_->RequestSayHello(&ctx_, &request_, &responder_, cq_, cq_, this) gets called, which tells gRPC to prepare for the reception of exactly one SayHello request.
At this point we don't know where the request will come from or when it will come, we are just telling gRPC that we are ready to process when one actually arrives, and let gRPC notice us when it happens.
Arguments to RequestSayHello tells gRPC where to put the context, request body and responder of the request after receiving one, as well as which completion queue to use for the notice and what tags should be attached to the notice event (in this case, this is used as the tag).
HandleRpcs() blocks on cq->Next(). Waiting for an event to occur.
some time later....
client makes a SayHello request to the server, gRPC starts receiving and decoding that request. (IO operation)
some time later....
gRPC have finished receiving the request. It puts the request body into the request_ field of the CallData object (via the pointer supplied earlier), then creates an event (with the pointer to the CallData object as tag, as asked earlier by the last argument to RequestSayHello). gRPC then puts that event into the completion queue cq_.
The loop in HandleRpcs() received the event(the previously blocked call to cq->Next() returns now), calls CallData::Proceed() to process the request.
status_ of the CallData is PROCESS, so it does the following:
6.1. Creates a new CallData object, so that new client requests after this one can be processed.
6.2. Generates the reply for the request, tells gRPC we have finished processing and please send the reply back to the client.
6.3 gRPC starts transmission of the reply. (IO operation)
6.4 The loop in HandleRpcs() goes into the next iteration and blocks on cq->Next() again, waiting for a new event to occur.
some time later....
gRPC have finished transmission of the reply and tells us that by again putting an event into the completion queue with a pointer to CallData as the tag.
cq->Next() receives the event and returns, CallData::Proceed() deallocates the CallData object (by using delete this;). HandleRpcs() loops and blocks on cq->Next() again, waiting for a new event.
It might look like the process is largely the same as synchonous API, just with extra access to the completion queue. However, by doing it this way, at each and every some time later.... (usually is waiting for IO operation to complete or waiting for a request to occur), cq->Next() can actually receive operation completion events not only for this request, but for other requests as well.
So if a new request come in while the first request is, let's say, waiting for the transmission of reply data to finish, cq->Next() will get the event emitted by the new request, and starts the processing of the new request immediately and concurrently, instead of waiting for the first request to finish its transmission.
Synchonous API, on the other hand, will always wait for the full completion of one request (from start receiving to finish replying) before even starting the receiving of another one. This meant near 0% CPU utilization while receiving request body data and sending back reply data (IO operations). Precious CPU time that could have been used to process other requests are wasted on just waiting.
This is really bad since if a client with a bad internet connection (100ms round-trip) sent a request to the server, we will have to spend at least 200ms for every request from this client just on actively waiting for TCP transmission to finish. That would bring our server performance down to only ~5 requests per second.
Whereas if we are using asynchonous API, we just don't actively wait for anything. We tell gRPC: "please send this data to the client, but we will not wait for you to finish here. Instead, just put a little letter to the completion queue when you are done, and we'll check it later." and move on to process other requests.
Related information
You can see how a simple server is written for both synchronous APIs and asynchronous APIs
Best performance practices
The best performance practice suggested by the gRPC C++ Performance Nodes is to spawn the amount of threads equal to your CPU cores count, and use one CompletionQueue per thread.

What notification is provided for a lost connection in a C++ gRPC async server

I have an async gRPC server for Windows written in C++. I’d like to detect the loss of connection to a client – whether a network connection is lost, or the client crashes, etc. I see references to the keepalive channel arguments, and I’ve tried various combinations of those settings, such as:
builder.AddChannelArgument(GRPC_ARG_KEEPALIVE_TIME_MS, 10000);
builder.AddChannelArgument(GRPC_ARG_KEEPALIVE_TIMEOUT_MS, 10000);
builder.AddChannelArgument(GRPC_ARG_KEEPALIVE_PERMIT_WITHOUT_CALLS, 1);
builder.AddChannelArgument(GRPC_ARG_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS, 9000);
builder.AddChannelArgument(GRPC_ARG_HTTP2_BDP_PROBE, 1);
I've done some testing with a streaming RPC method. If I kill the client process and then try to send data to the client, the lost connection is detected. I don't actually even have to send data. I can set an Alarm object to trigger immediately and that causes the call handler to be cancelled. However, if I don't try to send data (or set an alarm) after killing the client process then there's no notification or callback that I've been able to find/enable. I must not have a complete understanding. So:
How does the detection of a lost connection manifest itself for the server? Is there a callback method, or notification of some type? My server doesn’t receive any errors; the completion queue’s ‘Next()’ method never returns, etc.
Does this detection work for both unary (call/response) and streaming methods?
Does the server detection of a lost connection work whether or not the client has implemented lost connection / keepalive logic?
Is there some method besides the keepalive channel arguments that is preferred?
Thanks - any help is appreciated.
You can use ServerContext::AsyncNotifyWhenDone() to get a notification when the request has been cancelled.
https://grpc.github.io/grpc/cpp/classgrpc__impl_1_1_server_context_base.html#a0f1289f31257e6dbef57bc901bd7b5f2

Handling Windows Cryptographic Services (cryptsvc) dependency

I have a Windows service say test which has a dependency on cryptsvc. In some systems (like windows XP), cryptsvc starts later than my service.
One way to handle this is adding cryptsvc as dependent service in my test service.
But this will delay the start of test service as well.
I tried manually starting of cryptsvc using startservice() as part of my service initialization, something like below:
SERVICE_STATUS Status;
Status.dwCurrentState = SERVICE_START_PENDING;
setservicestatus(hTestService, &Status);
ServiceInit();
Status.dwCurrentState = SERVICE_RUNNING;
setservicestatus(hTestService, &Status);
And
ServiceInit()
{
// launch a worker thread that
// calls startservice() to start cryptsvc.
}
But the call to startservice() seems to block for some time and eventually ends with 1056 error (There is already an instance running).
How can I ensure crypt service starts as early as possible, or how can I start crypt service as part of my service initialization. Note that I don't want to strictly ensure that crypt service comes up before my service gets up, but crypt service should be up as soon as possible.

Correct way to register for pre-shutdown notification from C++

I write a local service application using C++ and I can't find the correct way of registering for a pre-shut-down notification (for OS later than Windows XP). I believe that SERVICE_CONTROL_PRESHUTDOWN notification has been added since Vista, but when you call SetServiceStatus do we need to specify:
dwServiceStatus.dwControlsAccepted = SERVICE_ACCEPT_PRESHUTDOWN;
or
dwServiceStatus.dwControlsAccepted = SERVICE_ACCEPT_SHUTDOWN | SERVICE_ACCEPT_PRESHUTDOWN;
You cannot accept both a shutdown and a preshutdown if your service is correctly coded. The documentation explicitly states this.
From http://msdn.microsoft.com/en-us/library/windows/desktop/ms683241(v=vs.85).aspx:
Referring to SERVICE_CONTROL_PRESHUTDOWN:
A service that handles this notification blocks system shutdown until the service stops or the preshutdown time-out interval specified through SERVICE_PRESHUTDOWN_INFO expires.
In the same page, the section about SERVICE_CONTROL_SHUTDOWN adds:
Note that services that register for SERVICE_CONTROL_PRESHUTDOWN notifications cannot receive this notification because they have already stopped.
So, the correct way is to set the dwControlsAccepted to include either SERVICE_ACCEPT_SHUTDOWN or SERVICE_ACCEPT_PRESHUTDOWN, depending on your needs, but not to both at the same time.
But do note that you probably want to accept more controls. You should always allow at least SERVICE_CONTROL_INTERROGATE, and almost certainly allow SERVICE_CONTROL_STOP, since without the latter the service cannot be stopped (e.g. in order to uninstall the software) and the process will have to be forcibly terminated (i.e. killed).
As noted by the commenters above, you will need to choose from either SERVICE_ACCEPT_SHUTDOWN or SERVICE_ACCEPT_PRESHUTDOWN (Vista or later). If you are using SERVICE_ACCEPT_PRESHUTDOWN, you will need to register your service with the SCM using RegisterServiceCtrlHandlerEx instead of RegisterServiceCtrlHandler else you will not be receiving the pre-shutdown notifications. The handler prototype also changes from Handler to HandlerEx.
Another point to note is that handling pure shutdown events is limited to 5 seconds in Windows Server 2012 (and presumably Windows 8), 12 seconds in Windows 7 and Windows Server 2008, 20 seconds in Windows XP before your service is killed while stopping. This is the reason why you may need the pre-shutdown notification. You may want to change this at \\HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout.
In the comment from alexpi there is a key piece of information. I found that the service handling PRESHUTDOWN needs to update the service status with a new checkpoint number (repeatedly) before WaitToKillServiceTimeout has elapsed. My server was configured to 5000 ms and my service only updated every 12000 ms, and the server went into the SHUTDOWN phase, which caused my attempt to stop another service to return the error that the shutdown was in progress.
These two notifications seem to be different as I get it from the documentation. If what you need is really to enable your service to recieve preshutdown notification, you should go with: dwServiceStatus.dwControlsAccepted = SERVICE_ACCEPT_PRESHUTDOWN; But if you also want to enable your service to receive shutdown notifications, you should go with your second option.

Stopping Windows service asynchronously

I am trying to control a service within an application. Starting the service via StartService (MSDN) works fine, the service needs about 10 seconds to start, but after calling StartService it gives the control back to the main-application immediately.
However, when stopping the service via ControlService (MSDN) - AFAIK there is no StopService - it blocks the main-application for the complete time until the service is stopped, which takes about 10 seconds.
Start: StartServiceW( handle, 0, NULL)
Stop: ControlService( handle, SERVICE_CONTROL_STOP, status )
Is there a way for a non-blocking / asynchronously stopping of a windows service?
I would probably look at stopping the service in a new thread. That will eliminate the blocking of your main thread.
The SCM processes control requests in a serialized manner. If any service is busy processing a control request, ControlService() will be blocked until the SCM can process the new request. This is stated as much in the documentation:
The SCM processes service control notifications in a serial fashion—it
will wait for one service to complete processing a service control
notification before sending the next one. Because of this, a call to
ControlService will block for 30 seconds if any service is busy
handling a control code. If the busy service still has not returned
from its handler function when the timeout expires, ControlService
fails with ERROR_SERVICE_REQUEST_TIMEOUT.
The service is doing its cleanup in its control handler routine. That's OK for a service that will only take a fraction of a second to exit, but a service that's going to take ten seconds should definitely be setting a status of STOP_PENDING and then cleaning up asynchronously.
If this is your own service, you should correct that problem. I'd start by making sure that all of the cleanup is really necessary; for example, there's no need to free memory before stopping (unless the service is sharing a process with other services). If the cleanup really can't be made fast enough, launch a separate thread (or signal your main thread) to perform the service shutdown and set the service status to STOP_PENDING.
If this is someone else's service, the only solution is to issue the stop request from a separate thread or in a subprocess.