I'm having some trouble setting uptime checks for some Cloud Run services that don't allow unauthenticated invocations.
For context, I'm using Cloud Endpoints + ESPv2 as an API gateway that's connected to a few Cloud Run services.
The ESPv2 container/API gateway allows unauthenticated invocations, but the underlying Cloud Run services do not (since requests to these backends flow via the API gateway).
Each Cloud Run service has an internal health check endpoint that I'd like to hit periodically via Cloud Monitoring uptime checks.
This serves the purpose of ensuring that my Cloud Run services are healthy, but also gives the added benefit of reduced cold boot times as the containers are kept 'warm'
However, since the protected Cloud Run services expect a valid authorisation header all of the requests from Cloud Monitoring fail with a 403.
From the Cloud Monitoring UI, it looks like you can only configure a static auth header, which won't work in this case. I need to be able to dynamically create an auth header per request sent from Cloud Monitoring.
I can see that Cloud Scheduler supports this already. I have a few internal endpoints on the Cloud Run services (that aren't exposed via the API gateway) that are hit via Cloud Scheduler, and I am able to configure an OIDC auth header on each request. Ideally, I'd be able to do the same with Cloud Monitoring.
I can see a few workarounds for this, but all of them are less than ideal:
Allow unauthenticated invocations for the underlying Cloud Run services. This will make my internal services publicly accessible and then I will have to worry about handling auth within each service.
Expose the internal endpoints via the API gateway/ESPv2. This is effectively the same as the previous workaround.
Expose the internal endpoints via the API gateway/ESPv2 AND configure some sort of auth. This sort of works but at the time of writing the only auth methods supported by ESPv2 are API Keys and JWT. JWT is already out of the question but I guess an API key would work. Again, this requires a bit of set up which I'd rather avoid if possible.
Would appreciate any thought/advice on this.
Thanks!
This simple solution may work on your use case as it is easier to just use a TCP uptime check on port 443:
Create your own Cloud Run service using https://cloud.google.com/run/docs/quickstarts/prebuilt-deploy.
Create a new uptime check on TCP port 443 Cloud Run URL.
Wait a couple of minutes.
Location results: All locations passed
Virginia OK
Oregon OK
Iowa OK
Belgium OK
Singapore OK
Sao Paulo OK
I would also like to advise that Cloud Run is a Google fully managed product and it has a 99.95 % monthly up time SLA, with no recent incidents in the past few months, but proactively monitoring this on your end is a very good thing too.
Related
I installed managed Anthos on a GKE cluster. Anthos Service Mesh is working and is displaying my API. Thanks to that Services that are in Monitoring automatically detect my API. This is great as it enables me to easily set SLOs and Error Budget for my API.
However I would like to be able to easily set SLOs for individual endpoints in my api. Services(in Monitoring) detect only my API and not the endpoints within my API(my API is one pod/container + sidecar). I tried to add endpoints to Services in Monitoring but it looks like it is only possible to add Kubernetes Objects there.
Is there a way to use Services in Monitoring with endpoints? Is the only way to do so to break endpoints to separate microservices?
You can monitor your endpoints using Cloud Endpoints with OpenAPI, which allows you to monitor the health of APIs you own by using the logs and metrics Cloud Endpoints maintains for you automatically. When users make requests to your API, Endpoints logs information about the requests and responses and also tracks three of the four golden signals of monitoring: latency, traffic, and errors. These usage and performance metrics help you monitor your API.
The following URL Configuring Cloud Endpoints has the configuration process for Cloud Endpoints. Use this URL Monitoring your API as a reference on the monitoring process for your API, and this last URL for the Cloud Endpoint’s overview.
According to the "Authenticating service-to-service" documentation for Cloud Run, to use Pub/Sub and Cloud Scheduler on a service, unauthenticated access must be disabled because they rely on HTTP calls because of the zero scaling capability of Cloud Run services.
My services allow internal and Load Balancer traffic and must be publicly available for frontend clients, but they also must be able to communicate with each other privately with Pub/Sub.
Is there a way to achieve this? It feels unnatural to create a separate private service just for using Pub/Sub.
It's a missing piece. You can't plug in your VPC PubSub push subscription and Cloud Scheduler (but also Cloud Task, Cloud Build, Workflows,...). I asked Google Cloud few months ago, and it should be fixed by a new network features, soon. At least in 2021!
So, in your case, if your Cloud Run service is accessible from the public internet through a Load Balancer, you can use this public endpoint to call the path that you want on your service and thus perform the process.
If your Cloud Run in only accessible from ingress=internal, you can't for now.
I have a service listening on 'https://myapp.a.run.app/dosomething', but I want to leverage the scalability features of Cloud Run, so in the controller for 'dosomething', I send off 10 requests to 'https://myapp.a.run.app/smalltask'; with my app configured to allow servicing of only one request per instance, I expect 10 instances to spin up, all do their smalltask, and return (all within the timeout period).
But I don't know how to properly authenticate the request, so those 10 requests all result in 403's. For Cloud Run services, I manually pass in a bearer token with the initial request, though I expect to add some api proxy at some point. But without said API proxy, what's the right way to send the request such that it is accepted? The app is running as a user that does have permissions to access the endpoint.
Authenticating service-to-service
If your architecture is using multiple services, these services will likely need to communicate with each other.
You can use synchronous or asynchronous service-to-service communication:
For asynchronous communication, use
Cloud Tasks for one to one asynchronous communication
Pub/Sub for one to many asynchronous communication
Cloud Scheduler for regularly scheduled asynchronous communication.
Cloud Workflows for orchestration services.
For synchronous communication
One service invokes another one over HTTP using its endpoint URL. In this use case, it's a good idea to ensure that each service is only able to make requests to specific services. For instance, if you have a login service, it should be able to access the user-profiles service, but it probably shouldn't be able to access the search service.
First, you'll need to configure the receiving service to accept requests from the calling service:
Grant the Cloud Run Invoker (roles/run.invoker) role to the calling service identity on the receiving service. By default, this identity is PROJECT_NUMBER-compute#developer.gserviceaccount.com.
In the calling service, you'll need to:
Create a Google-signed OAuth ID token with the audience (aud) set to the URL of the receiving service. This value must contain the schema prefix (http:// or https://) and custom domains are currently not supported for the aud value.
Include the ID token in an Authorization: Bearer ID_TOKEN header. You can get this token from the metadata server, while the container is running on Cloud Run (fully managed). If the application is running outside Google Cloud, you can generate an ID token from a service account key file.
For a full guide and examples in Node/Python/Go/Java and others see: Authenticating service-to-service
On a GCP compute environment, I need to get an id_token (expires every 3600s) to make service-to-service authentication (using GCF, Cloud Run etc).
I get this id_token from the instance metadata service:
http://metadata/computeMetadata/v1/instance/service-accounts/default/identity?audience=[...]
Instead of implementing some form of caching+TTL for this identity token, I'm curious if I can call this endpoint every time I will make an outbound RPC (I might make a lot).
That's why I'm curious:
What's the rate limit for the instance metadata service?
Does the rate limit differ between GCE vs serverless platforms (GAE, GCF, Cloud Run)?
I have a custom gRPC backend deployed behind an Endpoints Service Proxy (ESP) connected to Google Cloud Endpoints.
When sending a request with the X-Cloud-Trace-Context header set, I can see the spans recorded by ESP show up in my Stackdriver Trace dashboard.
However, my service is also sending requests to Google Cloud KMS as part of handling that request. I'd like Google Cloud to create trace spans for those sub-requests automatically for me as well; however, attaching the X-Cloud-Trace-Context header that ESP forwarded to me to the sub-requests sent to Cloud KMS does not cause any spans for those sub-requests to show up in Stackdriver Trace. The service account used to connect to Cloud KMS does have the "Stackdriver Trace Agent" role enabled.
Is it possible to tell Google Cloud services (such as Cloud KMS) to automatically generate trace spans for the current request's trace context, or do I need to manually generate traces for these requests in my backend code?
Cloud Trace doesn't currently generate service-side traces for requests to most GCP services, although we're aware of it as a valuable feature. To track how much of your latency is being consumed by KMS (or other services) you can create a client-side trace record using OpenCensus (Github) or similar.
Cloud KMS (as of this writing) doesn't support gRPC, but we are working on it.