How to handle long requests in Google Cloud Run? - google-cloud-platform

I have hosted my node app in Cloud Run and all of my requests served within 300 - 600ms time. But one endpoint that gets data from a 3rd party service so that request takes 1.2s - 2.5s to complete the request.
My doubts regarding this are
Is 1.2s - 2.5s requests suitable for cloud run? Or is there any rule that the requests should be completed within xx ms?
Also see the screenshot, I got a message along with the request in logs "The request caused a new container instance to be started and may thus take longer and use more CPU than a typical request"
What caused a new container instance to be started?
Is there any alternative or work around to handle long requests?
Any advice / suggestions would be greatly appreciated.
Thanks in advance.

I don't think that will be an issue unless you're worried about the cost of the CPU/memory time, which honestly should only matter if you're getting 10k+ requests/day. So, probably doesn't matter and cloud run can handle that just fine (my own app does requests longer than that with no problem)
It's possible that your service was "scaled to zero" meaning that there were no containers left running to serve requests. In that case, it would be necessary to start up a new instance and wait for whatever initializing/startup costs are associated with that process. It's also possible that it was auto-scaled due to all other instances being at their request limits. Make sure that your setting for max concurrent requests per instance is set greater than one - Node/Express can handle multiple requests at once. Plus, you'll only get charged for the total time spend, not per request:
In situations where you get very long (30 seconds, minutes+) operations, it may be a good idea to switch to some different data transfer method. You could use polling, where the client makes a request every 5 seconds and checks if the response is ready. You could also switch to some kind of push-based system like WebSockets, but Cloud Run doesn't have support for that.
TL;DR longer requests (~10-30 seconds) should be fine unless you're worried about the cost of the increased compute time they may occur at scale.

Related

How can I cancel a download that's too big or slow in a lambda?

I'm using axios in a lambda function to download a file from a user provided url. Obviously that file could be any size, and might be served at any speed. I am concerned that might create Denial of Service and Denial of Wallet risks.
I don't know if aws have any charges for lambda ingress, I haven't been able to find a definitive answer yet. Even if they don't though, large uploads could still force my lambdas to run for longer (costing me money) and potentially pushing me up against the rate limits I have set, in part, to mitigate flooding attack risk (denying people service).
Likewise, very slow downloads might cause my lambdas to run til they time out. My timeouts are set fairly high because there is processing to do once the file is downloaded. I'd rather bale after a small handful of seconds as the input data should always be small and fast.
So what I want is for downloads to abort if they hit a preset maximum size in bytes OR a maximum download time.
If adding these limits isn't possible with Axios then I'm open to using different libraries like node-fetch.
At the axios side itself, you can set a timeout and maxContentLength to limit the request time and download time. Lambda max timeout us 15 minutes.
If you possibly have many lengthy request, it is better to use EC2. Huge numbers of Lambda requests at high memory and high duration ends up more costly than EC2. Basically Serverless is indeed cost-effective and easy operationally especially for spiky type of workload. For steady 24/7 workload, long processing-times, better use VM.

Request takes too long to Django app deployed in AWS Lambda via Zappa

I have recently deployed a Django backend application to AWS Lambda using Zappa.
After the lambda function has not been invoked for some time, the first request to be made takes from 10 to 15 seconds to be processed. At first I thought it would be because of the cold start but even for a cold start this time is unacceptable. Then, reading through Zappa's documentation I saw that it enables by default the keep_warm feature that sends a dummy request to the lambda function every 4 minutes to keep it warm; so this excessive delay in the response to the first request to the lambda is not due to a cold start.
Then, I started using tools such as AWS X-Ray and Cloudwatch Insights to try to find the explanation for the delay. Here is what I found out:
The invokation that takes a very long time to be processed is the following:
Crossed out in red are the names of the environment variables the application uses. They are all defined and assigned a value directly in the AWS Console. What I don't understand is, first of all, why it takes so long, and secondly, why it says the environment variables are casted as None. The application works perfectly (apart from the massive delay in the first request) so the environment variables are correctly set somewhere.
This request is made every two hours religiously and the first time someone invokes the lambda function in some time, as seen in the following chart:
The dots in the x axis correspond to Zappa's dummy requests to keep the server warm. The elevated dots correspond to the invocation shown in the previous image. Finally, the spike corresponds to a user invocation. The time it took to process is the sum of the time it takes to process the long invocation (the one shown in the first image) and the time it takes to process the longest http request the client makes to the server. This request was the following:
It was a regular login request that should be resolved much faster. Other requests that are probably more demanding than this one were resolved in less than 100ms.
So, to sum up:
There is one lambda invocation that takes more than 10 seconds to be resolved. This corresponds to the first image shown. It is done every 2 hours and when a user makes a request to the server after it has been idle for some time.
Some requests take more than 2 seconds to be resolved and I have no idea as to why this could be.
Apart from these previous function invocations, all other requests are resolved in a reasonable time frame.
Any ideas as to why these invocations could be taking so much time is very much appreciated as I have spent quite some time trying to figure it out on my own and I have ran out of ideas. Thank you in advance!
Edit 1 (28/07/21): to further support my suspicion that this delay is not due to a cold start here is the "Segments Timeline" of the function in Cloudwatch/Application monitoring/Traces:
If it were a cold start, the delay should appear in the "Initialization" segment and not in the "Invocation" one.
Edit 2 (30/07/21): I forgot to mention that I had previously deployed the application using Elastic Beanstalk and didn't face this problem whatsoever so my code's performance is probably not the problem here.
Edit 3 (30/07/21): I found this thread in an AWS forum from 2016 regarding this exact issue. An AWS engineer mentioned that this behaviour is not by any means expected for a Lambda function outside of a VPC (like mine). Nevertheless, no answer was provided that explained the cause of the 10-15 seconds delay.
Edit 4 (03/08/21): I tried doubling the function's assigned memory (from 512 MB to 1024 MB) but it did not help.
I have also added some comments to the question to explain that this is probably not due to a cold start. As you rightly stated, cold starts are explicitly indicated and seem to only take about 500 ms in your case.
Cold starts this long usually only manifested themselves when lambdas were run in a VPC. And AWS has since changed the way lambdas get their network interface which has dramatically sped up that process.
That being said, a quick Google search led me to some interesting discussions on other sites about Django applications and lazy loading. I'll share some links here (even though they are not related to Lambda) in the hope they can help you find a solution:
https://community.webfaction.com/questions/11560/django-app-seems-very-slow-to-start-up-10-seconds
https://ses4j.github.io/2015/11/23/optimizing-slow-django-rest-framework-performance/
As a last note about the keep_warm. Sending those requests is quite an old trick in the book. However, be aware that there are no guarantees as to how long a lambda is kept warm by AWS. If an Init duration is indicated in the logs, however, you can be sure that it was a cold start.
If you need to ensure that a lambda function is warm and quick to respond to incoming requests, you'll have to use provisioned concurrency, which of course has its own price tag.
I can see some suggestions here on trying to increase the memory for your lambda (and I also saw that you tried from 512 to 1024). Have you tried increasing it further, say to about 3072? It's a significant increase, but this is just to prove that the problem is not due to resource limitations first.
The keep_warm feature isn't guaranteed as far as I've seen, and bulk of the (cold) start time is due to initialisation. Since the vcpu allocated to the lambda is proportional to the memory you assign to it, your lambda may initialise quicker and somehow mitigate these cold starts.

How does Cloud Run scaling down to zero affect long-computation jobs or external API requests?

I'm new to using Cloud Run and the idea of scaling down to zero is very appealing to me, but I have question about a few scenarios about its usage:
If I have a Cloud Run instance querying an external API endpoint, would the instance winds down while waiting for the response if no additional requests come in (i.e. I set the query time out to 60min, and no requests are received in that 60 min)?
If the Cloud Run instance is running computation that lasts for longer than 24 hour, or perhaps even days, without receiving requests, could it be trusted to carry out the computation until it's done without being randomly shutdown or restarted for servicing or other purposes (I ask this because Cloud Run is primarily intended as for stateless applications, but I have infrequent computation jobs that may take a long time that may be considered "stateful" in short-term context).
Does CPU utilization impact auto-scaling (e.g. if I have a computationally intensive job not configured for distributed computing running on one instance, would this trigger Cloud Run to spawn additional instances?)
If you deep dive in the documentation, I'm quite sure that you can find your answers. So, here a summary
(Interesting read).The Cloud Run instances are shut down only when they aren't in used (usually 15 minutes (can change at any time, no commitment, only observations) without request handling). In your case, if you are in a request handling context, no worries, your instance won't be killed, it is in use! Note: don't send an HTTP response before the end of the processing. Background process/jobs aren't considered in a request context. The context is considered from the receipt of the request to the response (OK or KO) back. Partial response/streaming is accepted.
Cloud run instance can, potentially, live more than 24h, but nothing is guaranteed. And, because the request handling is limited to 1h, you can't run process longer that that. I recommend you to have a look to GKE autopilot or to run a container on a Compute Engine and stop the VM at the end of the processing to save resources and money (or a hack to run your container on AI PLatform custom training; even if you train nothing, you run a custom container on a serverless platform!). If you can, I recommend you to design your workload to be split in several small and parallelizable jobs
Yes, it's described here. But keep in mind that only 1 request is processed on one instance. If you send a request that trigger an intensive compute job, the request will be only processed on the same instance (that can have several CPUs if your workload is compliant with that). And if another request comes in during the intensive processing, another Cloud Run instance will be spawn to handle it; only the new request.

How to fix CloudRun error 'The request was aborted because there was no available instance'

I'm using managed CloudRun to deploy a container with concurrency=1. Once deployed, I'm firing four long-running requests in parallel.
Most of the time, all works fine -- But occasionally, I'm facing 500's from one of the nodes within a few seconds; logs only provide the error message provided in the subject.
Using retry with exponential back-off did not improve the situation; the retries also end up with 500s. StackDriver logs also do not provide further information.
Potentially relevant gcloud beta run deploy arguments:
--memory 2Gi --concurrency 1 --timeout 8m --platform managed
What does the error message mean exactly -- and how can I solve the issue?
This error message can appear when the infrastructure didn't scale fast enough to catch up with the traffic spike. Infrastructure only keeps a request in the queue for a certain amount of time (about 10s) then aborts it.
This usually happens when:
traffic suddenly largely increase
cold start time is long
request time is long
We also faced this issue when traffic suddenly increased during business hours. The issue is usually caused by a sudden increase in traffic and a longer instance start time to accommodate incoming requests. One way to handle this is by keeping warm-up instances always running i.e. configuring --min-instances parameters in the cloud run deploy command. Another and recommended way is to reduce the service cold start time (which is difficult to achieve in some languages like Java and Python)
I also experiment the problem. Easy to reproduce. I have a fibonacci container that process in 6s fibo(45). I use Hey to perform 200 requests. And I set my Cloud Run concurrency to 1.
Over 200 requests I have 8 similar errors. In my case: sudden traffic spike and long processing time. (Short cold start for me, it's in Go)
I was able to resolve this on my service by raising the max autoscaling container count from 2 to 10. There really should be no reason that 2 would be even close to too low for the traffic, but I suspect something about the Cloud Run internals were tying up to 2 containers somehow.
Setting the Max Retry Attempts to anything but zero will remedy this, as it did for me.

Amazon EC2 EBS i/o costs

I am hosting my application on amazon ec2 , on one of their micro linux instances.
It costs (apart from other costs) $0.11 per 1 million I/O requests . I was wondering how much I/O requests does it take when I have say 1000 users using it for say 1 hours per day for 1 month ?
I guess my main concern is : if a hacker keeps hitting my login page (simple html) , will it increase the I/O request count ? I guess yes, as every time the server needs to do something to server that page.
There are a lot of factors that will impact your IO requests, as #datasage says, try it and see how it behaves under your scenario. Micro Linux instances are incredible cheap to begin with, but if you are really concerned, setup a billing alert that will notify you when your usage passes a pre-determined threshold - if it suddenly spikes up, you can take some action to shut it down if that is what you want.
https://portal.aws.amazon.com/gp/aws/developer/account?ie=UTF8&action=billing-alerts
Take a look at CloudWatch, and (for free) set up a VolumeWriteOps and VolumeReadOps alarm to work with Amazon Simple Notification Service (SNS) to send you a text message and eMail notice right away if things get too busy, before the bill gets high! (A billing alert will let you know too late - after it has reached the threshold.)
In general though, from my experience, you will not have the problem you outline. Scan the EC2 Discussion Forum at forums.aws.amazon.com where you would find evidence of this kind of problem if were prevalent; it does not seem to be happening.
#Dilpa yes you are right. If some brute force attack will occur to your website eg: somebody hitting to your loginn page then it will increase the server I/O if you enable loging for your webserver. Webserver will keep log to it's log files of every event and that will increase your I/O. Just verify your webserver log for such kind of attack and you can prevent them.