How does Cloud Run scaling down to zero affect long-computation jobs or external API requests? - google-cloud-platform

I'm new to using Cloud Run and the idea of scaling down to zero is very appealing to me, but I have question about a few scenarios about its usage:
If I have a Cloud Run instance querying an external API endpoint, would the instance winds down while waiting for the response if no additional requests come in (i.e. I set the query time out to 60min, and no requests are received in that 60 min)?
If the Cloud Run instance is running computation that lasts for longer than 24 hour, or perhaps even days, without receiving requests, could it be trusted to carry out the computation until it's done without being randomly shutdown or restarted for servicing or other purposes (I ask this because Cloud Run is primarily intended as for stateless applications, but I have infrequent computation jobs that may take a long time that may be considered "stateful" in short-term context).
Does CPU utilization impact auto-scaling (e.g. if I have a computationally intensive job not configured for distributed computing running on one instance, would this trigger Cloud Run to spawn additional instances?)

If you deep dive in the documentation, I'm quite sure that you can find your answers. So, here a summary
(Interesting read).The Cloud Run instances are shut down only when they aren't in used (usually 15 minutes (can change at any time, no commitment, only observations) without request handling). In your case, if you are in a request handling context, no worries, your instance won't be killed, it is in use! Note: don't send an HTTP response before the end of the processing. Background process/jobs aren't considered in a request context. The context is considered from the receipt of the request to the response (OK or KO) back. Partial response/streaming is accepted.
Cloud run instance can, potentially, live more than 24h, but nothing is guaranteed. And, because the request handling is limited to 1h, you can't run process longer that that. I recommend you to have a look to GKE autopilot or to run a container on a Compute Engine and stop the VM at the end of the processing to save resources and money (or a hack to run your container on AI PLatform custom training; even if you train nothing, you run a custom container on a serverless platform!). If you can, I recommend you to design your workload to be split in several small and parallelizable jobs
Yes, it's described here. But keep in mind that only 1 request is processed on one instance. If you send a request that trigger an intensive compute job, the request will be only processed on the same instance (that can have several CPUs if your workload is compliant with that). And if another request comes in during the intensive processing, another Cloud Run instance will be spawn to handle it; only the new request.

Related

How should I pull from Pub/Sub using Compute Engine MIGs

In my personal case, Pub/Sub's pushes to a Python service on Cloud Functions are being unfeasible due to it's short timeout. So the idea of having a container-based managed instance group of Compute Engine instances sounds good, these instances can scale up/down based on Pub/Sub pending task count metrics. These machines' containers would run Python code on startup, the given code would PULL Pub/Sub and process the pulled job accordingly.
Contextualization aside, the question is: Is it a good idea? Are there any gotchas? As there would be several machines at scale, how could I guarantee that a same given 'queued task' would not be picked and have it's processing started on more than one of these machines? I know about ACKs, but ACKs should just be emitted when the task ends successfully, isn't it? What strategy to use to prevent the initially mentioned and other problems?

Is Google Cloud Tasks suitable for asynchronous user-facing tasks?

Suppose we have a web-service written in python, that does some time-consuming file processing. It definitely should not be run inside the HTTP handler as it takes up to 10 mins to complete. Instead, the processing should be done asynchronously by some sort of workers, and it would be also nice to report the progress of the task execution to display to the user.
Would it be a good idea to setup Google Cloud Tasks with some Cloud Run or Cloud Functions service as a HTTP target to do this work?
Is Google Cloud Tasks suitable for handling this type of async tasks, where the user is sitting and waiting for result?
If not, is there any other options to achieve this with Google Cloud? (or should I use custom task services for this purpose, for instance, celery and redis) It also seems that Cloud Run Jobs features somewhat similar functionality, but there are not any queue systems to manage workers.
GC tasks is simply a tool for queuing tasks. It is a useful tool for ensuring that tasks do run, as it has built in retry mechanisms. How you use that in the context of an application depends on a lot of other detail of the application itself, but it is definitely possible to use it for background/asynchronous processing of tasks.
We use Google Cloud tasks to implement long running processes that report their progress via data store records. Some of these processes run longer than the standard 10 minute timeout, and trigger a new cloud task to complete the processing. We then have a simple lightweight handler that retrieves the status record from data store and reports that to the user. We poll that handler from the client, but you could also implement something like websockets.
GCP can handle Asynchronous tasks, Asynchronous execution is a well-established way to reduce request latency and make your application more responsive.
We can use cloud run or cloud functions for this type of tasks , Because we can increase the time limit upto 30 min in http task handlers in GCP cloud tasks.
For more information refer to this document.
We use Google Cloud tasks to implement long running processes that report their progress via data store records. Some of these processes run longer than the standard 10 minute timeout, and trigger a new cloud task to complete the processing.

Running background processes in Google Cloud Run

I have a lightweight server that runs cron jobs at a given time. As I understand Google Cloud Run only processes incoming requests and then becomes idle after a short time if there is no other request to process. Hence, it is not advisable to deploy that cron service to Cloud Run.
Out of curiosity, I deployed the following server that starts up and then prints a log every hour.
const express = require('express');
const app = express();
setInterval(() => console.log('ping!'), 1000 * 60 * 60);
app.listen(process.env.PORT, () => {
console.log('server listening');
})
I deployed it with a minimum and maximum instance count of 1. It has not received any request and when I checked back the next day, it was precisely printing the log every hour. Was this coincidence or can I use this setup for production?
If you set the min instance to 1 and the CPU always on to true, yes, you can perform background compute intensive processing without CPU Throttling (in your hello world case, you can use the few CPU % allowed to the idle instance without the CPU always on option).
BUT, and the but is very important, you will pay for 1 Cloud Run instance always up. In addition, is you receive request, you can scale up and have more than 1 instance up and running. Does it make sense to have several instances with the same CRON scheduling? (except if you set the max instance to 1).
At the end, the best pattern is to host the scheduling outside, on Cloud Scheduler, and then to query your instance to perform the task. It's serverless, you can handle several task in parallel, it's scalable.
From my understanding no.
From the documentation here, Google indicates that the CPU of idle instances is throttled to nearly zero. I suppose this means that very simple operation can still be performed (e.g. logging a string every hour). I guess you could test it more extensively by doing some more complex operations and evaluate the processing time of these operations.
Either way, I would not count on it in a production environment. There is no guarantee that the CPU "throttled to nearly zero" will be able to complete the operations you need in a reasonable time delay.

How to use Google Cloud PubSub and Run to handle resource-intensive long-running tasks?

I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states
When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.
So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?
Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question.
I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.
This project is written in Python.
So preferably I would like to write my Google Cloud Run tasks like this:
#app.route('/', methods=['POST'])
def index():
"""Endpoint for Google Cloud PubSub messages"""
pubsub_message = request.get_json()
logger.info(f'Received PubSub pubsub_message {pubsub_message}')
if message_incorrect(pubsub_message):
return "Invalid request", 400 #use normal NACK handling
# acknowledge message here without returning
# ...
# Do actual processing of the task here
# ...
So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.
Answers:
In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions
Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.
If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.
Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.
Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.

How to handle long requests in Google Cloud Run?

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.
I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.