Do Google Cloud background functions have max timeout? - google-cloud-platform

We have been using Google Cloud Functions with http-triggers, but ran into the limitation of a maximum timeout of 540 s.
Our jobs are background jobs, typically datapipelines, with processing times often longer than 9 minutes.
Do background functions have this limit, too? It is not clear to me from the documentation.

All functions have a maximum configurable timeout of 540 seconds.
If you need something to run longer than that, consider delegating that work to run on another product, such as Compute Engine or App Engine.

2nd Generation Cloud Functions that are triggered by https can have a maximum timeout of 1 hour instead of the 10 minute limit.
See also: https://cloud.google.com/functions/docs/2nd-gen/overview
You can then trigger this 2nd gen Cloud Function with for example Cloud Scheduler.
When creating the job on Cloud Scheduler you can set the Attempt deadline config to 30 minutes. This is the deadline for job attempts. Otherwise it is cancelled and considered a failed job.
See also: https://cloud.google.com/scheduler/docs/reference/rest/v1/projects.locations.jobs#Job

The maximum run time of 540 seconds applies to all Cloud Functions, no matter how they're triggered. If you want to run something longer you will have to either chop it into multiple parts, or run it on a different platform.

Related

Running background processes in Google Cloud Run

I have a lightweight server that runs cron jobs at a given time. As I understand Google Cloud Run only processes incoming requests and then becomes idle after a short time if there is no other request to process. Hence, it is not advisable to deploy that cron service to Cloud Run.
Out of curiosity, I deployed the following server that starts up and then prints a log every hour.
const express = require('express');
const app = express();
setInterval(() => console.log('ping!'), 1000 * 60 * 60);
app.listen(process.env.PORT, () => {
console.log('server listening');
})
I deployed it with a minimum and maximum instance count of 1. It has not received any request and when I checked back the next day, it was precisely printing the log every hour. Was this coincidence or can I use this setup for production?
If you set the min instance to 1 and the CPU always on to true, yes, you can perform background compute intensive processing without CPU Throttling (in your hello world case, you can use the few CPU % allowed to the idle instance without the CPU always on option).
BUT, and the but is very important, you will pay for 1 Cloud Run instance always up. In addition, is you receive request, you can scale up and have more than 1 instance up and running. Does it make sense to have several instances with the same CRON scheduling? (except if you set the max instance to 1).
At the end, the best pattern is to host the scheduling outside, on Cloud Scheduler, and then to query your instance to perform the task. It's serverless, you can handle several task in parallel, it's scalable.
From my understanding no.
From the documentation here, Google indicates that the CPU of idle instances is throttled to nearly zero. I suppose this means that very simple operation can still be performed (e.g. logging a string every hour). I guess you could test it more extensively by doing some more complex operations and evaluate the processing time of these operations.
Either way, I would not count on it in a production environment. There is no guarantee that the CPU "throttled to nearly zero" will be able to complete the operations you need in a reasonable time delay.

Cloud Scheduler invoked Twice in 10 mins

We have a cloud scheduler that runs at an interval of every 6 hours and calls an application deployed in GKE (single Pod) via HTTP. We are observing a strange behaviour where the application is called a second time at 10th min of the initial run. The job runs for nearly an hour.
So, at the end, we finally see two parallel processing of the same event with the diff of 10 mins. Any pointers will be helpful :slightly_smiling_face:

Google Cloud Function Timeout Setting doesn't work

I can't get a Google Cloud Function to run for more than 60secs, even when the timeout is set to 540secs!! Any suggestions?
I set the timeout flag on deployment to --timeout=540, and I know the setting goes through, because the 540 sec timeout setting appears in the GCP WEB UI. I have also tried to manually edit the timeout to 540 through the GCP WEB UI. But in any case i still get the DEADLINE_EXCEEDED after just ~ 62000 ms.
I have tried both the pub/sub and https methods as the func trigger, but still get the premature function timeout at ~60s.
Im running the latest CLI, with these these func settings:
trigger: http/pubsub (both tested, same result)
availableMemoryMb: 2048
runtime: nodejs6
status: ACTIVE
timeout: 540s
Thanks for any inputs!
Br Markus
I have used the documentation code for delay and executed a Cloud Function with the same specifications as yours. In the documentation, the execution is delayed 120000 ms (2 mins). I edited that and put it at 500000 ms. This plus the normal time that the CF takes to execute, will reach the desired execution time (around 9 minutes). If you add 540000 to test the code, it will execute with timeout error at ~540025, because the value itself is exceeding the timeout limit of the Cloud Function and at the same time the default maximum timeout limit of a Cloud Function, which is 9 minutes.
I also tried the creating the function using this command
gcloud functions deploy [FUNCTION_NAME] --trigger-http --timeout=540.
After successful deployment, I updated the code manually in the GCP Cloud Function UI as follows
exports.timeoutTest = (req, res) => {
setTimeout(() => {
let message = req.query.message || req.body.message || 'Hello World today!';
res.status(200).send(message);
res.end();
}, 500000);
};
Both times the Cloud Function was executed and returned with status code 200. This means that you can set a timeout to be more than 60 secs which is the default value.
If you revised everything correctly and you still have this issue, I recommend you to start afresh, create a new CF and use the documentation link I provided.
The 60 seconds timeout is not resulting from GCP Cloud Function setting. For instance if this is a Django/Gunicorn App, the timeout is coming from the timeout of gunicorn that is set in app.yaml
entrypoint: gunicorn -t 3600 -b :$PORT project_name.wsgi
For instance, this will achieve a timeout of 3600 seconds for gunicorn.
I believe I'm some years late but here is my suggestion.
If you're using the "Test the function" button in the "Testing tab" of the Cloud Function (in the gcp "Cloud Console") it says right next to the button that:
Testing in the Cloud Console has a 60s timeout. Note that this is different from the limit set in the function configuration.
I hope you fixed it and this answer can help someone in the future.
Update: Second try ("Test the function") was precisely 9 minutes
From: 23:15:38
Till: 23:24:38
And it is exactly the 9 minutes, although the message again was about 60 seconds only and popped up much earlier than the actual stop.
Function execution took 540004 ms, finished with status: 'timeout'
This time with a lot of memory (2 GB), timeout clearly made it stop. The message is perhaps just popping up earlier since it has not been programmed in detail, my guess. You should always look at the logs to see what is happening.
I guess that the core of your question is outdated then: At least in 01/2022, you do have the demanded timeout time regardless of the what you may read, and you just should not care about the messages.
First try ("Test the function") 8 minutes after reached memory limit
A screenshot of how it looks like in 2022/01 if you get over the 60 seconds (with 540s maximum timeout for this example function set in the "Edit" menu of the CF):
Function being tested has exceeded the 60s timeout imposed by the Cloud Functions testing utility.
Yet, in reality, when using just the "Testing tab" the timeout is at least after 300s / 5 minutes which can be seen next to the "Test the function" button:
Testing in the Cloud Console has a 5 minute timeout. Note that this is different from the limit set in the function configuration.
But it is even more. I know from testing (started from the "Testing tab" --> "Test Function" in the Cloud Function) that you have at least 8 minutes:
From 22:31:43:
Till 22:39:53
And this was at first stopped by the 256 MB limit, secondly only by time (a bit unclear why there were both messages).
Therefore, your question about why you get only 60 seconds timeout time might rather ask why these messages are wrong (like in my case). Perhaps GCP did not make the effort to parametrize the messages for each function.
Perhaps you get even slightly more time when you start with gcloud from terminal, but that is not so likely since 9 minutes are the maximum anyway.

Where and how to set up a function which is doing GET request every second?

I am trying to setup a function which will be working somewhere on the server. It is a simple GET request and I want to trigger it every second.
I tried google cloud functions and AWS. Both of them don't have a straightforward solution to run it every second. (every 1 minute only)
Could you please suggest me a service, or combination of services that will allow me to do it. (preferably not costly)
Here are some options on AWS ...
Launch a t2.nano EC2 instance to run a script that issues GET, then sleeps for 1 second, and repeats. You can't use cron (doesn't support every second). This costs about 13 cents per day.
If you are going to do this for months/years then reduce the cost by using Reserved Instances.
If you can tolerate periods where the GET requests don't happen then reduce the cost even further by using Spot instances.
That said, why do you need to issue a GET request every second? Perhaps there is a better solution here.
You can create a AWS Lambda function, which simply loops and issues the GET request every second, and exits after 240 requets (i.e. 4 minutes). Then create a CloudWatch event that fires every 4 minutes calling the Lambda function.
Every 4 minutes because the maximum timeout you can set for a Lambda function is 5 minutes.
This setup will likely incur only some trivial cost:
At 1 event per 4 minutes, it's $1/month for the CloudWatch events generated.
At 1 call per 4 minutes to a minimally configured (128MB) Lambda function, it's 324,000 GB-second worth of execution per month, just within the free tier of 400,000 GB-second.
Since network transfer into AWS is free, the response size of your GET request is irrelevant. And the first 1GB of transfer out to the Internet is free, which should cover all the GET requests themselves.

Is there a way to set a walltime on AWS Batch jobs?

Is there a way to set a maximum running time for AWS Batch jobs (or queues)? This is a standard setting in most batch managers, which avoids wasting resources when a job hangs for whatever reason.
As of April, 2018, AWS Batch now supports setting a Job Timeout when submitting a Job, or in the job definition.
https://aws.amazon.com/about-aws/whats-new/2018/04/aws-batch-adds-support-for-automatic-termination-with-job-execution-timeout/
You specify an attemptDurationSeconds parameter, which must be at least 60 seconds, either in your job definition, or when you submit the job. When this number of seconds has passed following the job attempt's startedAt timestamp, AWS Batch terminates the job. On the compute resource, your job's container receives a SIGTERM signal to give your application a chance to shut down gracefully; if the container is still running after 30 seconds, a SIGKILL signal is sent to forcefully shut down the container.
Source: https://docs.aws.amazon.com/batch/latest/userguide/job_timeouts.html
POST /v1/submitjob HTTP/1.1
Content-type: application/json
{
...
"timeout": {
"attemptDurationSeconds": number
}
}
AFAIK there is no feature to do this. However, a workaround was suggested in the forum for a similar question.
One idea is to call Batch as an Activity from Step Functions, pingback
back on a schedule (e.g. every minute) from that job. If it stops
responding then you can detect that situation as a Timeout in the
activity and act accordingly (terminate the job etc.). Not an ideal
solution (especially if the job continues to ping back as a "zombie"),
but it's a start. You'd also likely have to store activity tokens in a
database to trace them to Batch job id.
Alternatively, you split that setup into 2 steps, and schedule a Batch
job from a Lambda in the first state, then pass the Batch job id to
the second step which then polls Batch (from another Lambda) for its
state with Retry and IntervalSeconds (e.g. once every minute, or even
with exponential backoff), and MaxAttempts calculated based on your
timeout. This way, you don't need any external state storage
mechanism, long polling or even a "ping back" from the job (it CAN be
a zombie), but the downside is more steps.
There is no option to set timeout on batch job but you can setup a lambda function that triggers every 1 hour or so and deletes jobs created before say 24 hours.
working with aws for some time now and could not find a way to set a maximum running time for batch jobs.
However there are some alternative way which you could utilize.
AWS Forum
Sadly there is no way to set the limit execution time on AWS Batch.
One solution may be to edit the docker's entry point to schedule the execution time limit.