Google Cloud Function Timeout Setting doesn't work - google-cloud-platform

I can't get a Google Cloud Function to run for more than 60secs, even when the timeout is set to 540secs!! Any suggestions?
I set the timeout flag on deployment to --timeout=540, and I know the setting goes through, because the 540 sec timeout setting appears in the GCP WEB UI. I have also tried to manually edit the timeout to 540 through the GCP WEB UI. But in any case i still get the DEADLINE_EXCEEDED after just ~ 62000 ms.
I have tried both the pub/sub and https methods as the func trigger, but still get the premature function timeout at ~60s.
Im running the latest CLI, with these these func settings:
trigger: http/pubsub (both tested, same result)
availableMemoryMb: 2048
runtime: nodejs6
status: ACTIVE
timeout: 540s
Thanks for any inputs!
Br Markus

I have used the documentation code for delay and executed a Cloud Function with the same specifications as yours. In the documentation, the execution is delayed 120000 ms (2 mins). I edited that and put it at 500000 ms. This plus the normal time that the CF takes to execute, will reach the desired execution time (around 9 minutes). If you add 540000 to test the code, it will execute with timeout error at ~540025, because the value itself is exceeding the timeout limit of the Cloud Function and at the same time the default maximum timeout limit of a Cloud Function, which is 9 minutes.
I also tried the creating the function using this command
gcloud functions deploy [FUNCTION_NAME] --trigger-http --timeout=540.
After successful deployment, I updated the code manually in the GCP Cloud Function UI as follows
exports.timeoutTest = (req, res) => {
setTimeout(() => {
let message = req.query.message || req.body.message || 'Hello World today!';
res.status(200).send(message);
res.end();
}, 500000);
};
Both times the Cloud Function was executed and returned with status code 200. This means that you can set a timeout to be more than 60 secs which is the default value.
If you revised everything correctly and you still have this issue, I recommend you to start afresh, create a new CF and use the documentation link I provided.

The 60 seconds timeout is not resulting from GCP Cloud Function setting. For instance if this is a Django/Gunicorn App, the timeout is coming from the timeout of gunicorn that is set in app.yaml
entrypoint: gunicorn -t 3600 -b :$PORT project_name.wsgi
For instance, this will achieve a timeout of 3600 seconds for gunicorn.

I believe I'm some years late but here is my suggestion.
If you're using the "Test the function" button in the "Testing tab" of the Cloud Function (in the gcp "Cloud Console") it says right next to the button that:
Testing in the Cloud Console has a 60s timeout. Note that this is different from the limit set in the function configuration.
I hope you fixed it and this answer can help someone in the future.

Update: Second try ("Test the function") was precisely 9 minutes
From: 23:15:38
Till: 23:24:38
And it is exactly the 9 minutes, although the message again was about 60 seconds only and popped up much earlier than the actual stop.
Function execution took 540004 ms, finished with status: 'timeout'
This time with a lot of memory (2 GB), timeout clearly made it stop. The message is perhaps just popping up earlier since it has not been programmed in detail, my guess. You should always look at the logs to see what is happening.
I guess that the core of your question is outdated then: At least in 01/2022, you do have the demanded timeout time regardless of the what you may read, and you just should not care about the messages.
First try ("Test the function") 8 minutes after reached memory limit
A screenshot of how it looks like in 2022/01 if you get over the 60 seconds (with 540s maximum timeout for this example function set in the "Edit" menu of the CF):
Function being tested has exceeded the 60s timeout imposed by the Cloud Functions testing utility.
Yet, in reality, when using just the "Testing tab" the timeout is at least after 300s / 5 minutes which can be seen next to the "Test the function" button:
Testing in the Cloud Console has a 5 minute timeout. Note that this is different from the limit set in the function configuration.
But it is even more. I know from testing (started from the "Testing tab" --> "Test Function" in the Cloud Function) that you have at least 8 minutes:
From 22:31:43:
Till 22:39:53
And this was at first stopped by the 256 MB limit, secondly only by time (a bit unclear why there were both messages).
Therefore, your question about why you get only 60 seconds timeout time might rather ask why these messages are wrong (like in my case). Perhaps GCP did not make the effort to parametrize the messages for each function.
Perhaps you get even slightly more time when you start with gcloud from terminal, but that is not so likely since 9 minutes are the maximum anyway.

Related

Cloud Run crashes after 121 seconds

I'm triggering a long running scraping Cloud Run function with a PubSub topic and subscription trigger. Everytime I run it it does crash after 121.8 seconds but I don't get why.
POST 503 556B 121.8s APIs-Google; (+https://developers.google.com/webmasters/APIs-Google.html) https://????.a.run.app/
The request failed because either the HTTP response was malformed or connection to the instance had an error.
I've got a built-in timeout trigger and when I set it at 1 minute the functions runs without any problems but when I set at 2 minutes the above error gets triggered so it must be something with the Cloud Run or Subscription timeout settings but I've tried to increase those (read more below).
Things involved
1 x Cloud Run
1 x SubPub subscription
1 x SubPub topic
These are the things I've checked
The timeout of the Cloud Run instance (900 sec)
The timeout of the Pubsub subscription (Acknowledgement deadline - 600 sec & Message retention duration - 10 minutes)
I've increased the memory to 4GB and that is way above what it's needed.
Anyone who can point me in the right direction?
This is almost certainly due to Node.js' default server timeout of 120secs.
Try server.setTimeout(0) to remove this timeout.

A timeout was reached (45000 milliseconds) while waiting for the MyService service to connect

I have developed a Win32 service (SERVICE_WIN32_OWN_PROCESS) in C++ for Windows 10. It fails to start once in a while, with the following messages in the event log:
A timeout was reached (45000 milliseconds) while waiting for the MyService service to connect.
The MyService service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
What kind of timeout is happening here?
I know that when a service starts up, there is a timeout of 30 seconds from the start of the executable to the call of StartServiceCtrlDispatcher(). I have a log statement just before the call to StartServiceCtrlDispatcher(), but I do not see it. Unfortunately, I do not have any log statements at the point where the service starts up. In between startup and StartServiceCtrlDispatcher(), I have a bit of initialization, but nothing that I would expect to take 30 seconds to finish.
My service never reaches StartServiceCtrlDispatcher() and I have not seen traces in the event log that it crashes.
So, why does the error message mention a timeout of 45 seconds and not 30 seconds? What does this timeout represent?
Edit: For now I am mostly interested if other persons have experienced similar timeout and if they have figured out the reason. I need to debug my code, but I hope that someone might be able to give a direction in which I could concentrate my debugging in. Later I might need specific help with my code when I know where to look :-)
Edit: Microsoft describes many kinds of timeout in their API documentation for services. But I have not seen any mentioning of a 45 seconds timeout even if I have read all the API calls that I am using.
Note: I have not modified any timeouts in the system/registry, if such a thing is possible.
Edit:
Notes about my service.
The issue happens on a users pc that I do not have direct access to.
My service starts up correctly most of the time, but when it fails, it might be during a windows update under boot-up, that causes it.
In a virtual machine with a debug version of my service it takes less than 2 seconds from start of executable to call of StartServiceCtrlDispatcher(). That sounds reasonable. Far below 30 seconds and 45 seconds.
I have in my development environment tried to add a delay (sleep) between start and StartServiceCtrlDispatcher() of greater than 30 seconds. This gave me the standard message about a 30000 miliseconds timeout. Not 45000!
I have tried to force a crash between start and StartServiceCtrlDispatcher(). This gave me a "Application Error" event log entry about the crash and a standard 30000 seconds timeout. Not 45000! On the problem PCs eventlog I have not noticed any "Application Error" when the startup failed.

Cloud Run finishes but Cloud Scheduler thinks that job has failed

I have a Cloud Run service setup and I have a Cloud Scheduler task that calls an endpoint on that service. When the task completes (http handler returns), I'm seeing the following error:
The request failed because the HTTP connection to the instance had an error.
However, the actual handler returns HTTP 200 and successfully exists. Does anyone know what this error means and under what circumstances it shows up?
I'm also attaching a screenshot of the logs.
Does your job take longer than 120 seconds? I was having the same issue and figured out node versions prior to 13 has 120 seconds server.timeout limit. I installed node 13 on docker and problem is gone.
Error 503 is returned by the Google Frontend (GFE). The Cloud Run service either has a transient issue, or the GFE has determined that your service is not ready or not working correctly.
In your log entries, I see a POST request. 7 ms later is the error 503. This tells me your Cloud Run application is not yet ready (in a ready state determined by Cloud Run).
One minute, 8 seconds before, I see ReplaceService. This tells me that your service is not yet in a running state and that if you retry later, you will see success.
I've run an incremental sleep test on my FLASK endpoint which returns 200 within 1 min, 2 min and 10 min of waiting time. Having triggered the endpoint via the Cloud Scheduler, the job failed only in the 10 min test. I've found that it was one of the properties of my Cloud Scheduler job causing the failure. The following solved my issue.
gcloud scheduler jobs describe <my_test_scheduler>
There, you'll see a property called 'attemptDeadline' which was set to 180 seconds by default.
You can update that property using:
gcloud scheduler jobs update http <my_test_scheduler> --attempt-deadline 1000s
Ref: scheduler update

Do Google Cloud background functions have max timeout?

We have been using Google Cloud Functions with http-triggers, but ran into the limitation of a maximum timeout of 540 s.
Our jobs are background jobs, typically datapipelines, with processing times often longer than 9 minutes.
Do background functions have this limit, too? It is not clear to me from the documentation.
All functions have a maximum configurable timeout of 540 seconds.
If you need something to run longer than that, consider delegating that work to run on another product, such as Compute Engine or App Engine.
2nd Generation Cloud Functions that are triggered by https can have a maximum timeout of 1 hour instead of the 10 minute limit.
See also: https://cloud.google.com/functions/docs/2nd-gen/overview
You can then trigger this 2nd gen Cloud Function with for example Cloud Scheduler.
When creating the job on Cloud Scheduler you can set the Attempt deadline config to 30 minutes. This is the deadline for job attempts. Otherwise it is cancelled and considered a failed job.
See also: https://cloud.google.com/scheduler/docs/reference/rest/v1/projects.locations.jobs#Job
The maximum run time of 540 seconds applies to all Cloud Functions, no matter how they're triggered. If you want to run something longer you will have to either chop it into multiple parts, or run it on a different platform.

Slack slash command works sometimes

We have a Slack slash command that executes a Lambda (written in node) in AWS. The Lambda calls an internal service we have and returns JSON. It often takes multiple executions to get the slash command to work. The caller gets the below message:
Darn - that slash command didn't work. If you see this message more than once we suggest you contact "name".
We ran a bash sript that calls the lambda once a minute for 12 hours. The average duration of the calls was about 1.5 seconds, well below the slash command expectation that a response will be returned in 3 seconds. Has anyone else experienced this issue?
Increase the timeout over 3 secs though your estimated run time is around 1.5 seconds.
Also, it is to be noted that AWS Lambda limits the total concurrent executions across all functions within a given region to 100 (default limit which can increased on request)