API Gateway V2 slowness on initial request - amazon-web-services

I've got an API Gateway V2 (Protocol:HTTP) style endpoint, it simply makes a request to my Lambda function and gives me the response. I've noticed that if I make no request for about 10 minutes or so, that on a new request it's going WAY slower than the requests afterwards. It's the same function that does the same thing every time so I'm not sure why this is happening, has anyone else ever had this and found a solution?

The reason for this is that your Lambda function has to be started before it can handle requests.
This is also called cold start.
Starting a new instance of your Lambda does take some time. Once it is started it will serve several requests. At some point in time the AWS Lambda service is going to shutdown your Lambda function. For example when there has not been any traffic for a while.
That's where your observation is coming from:
I've noticed that if I make no request for about 10 minutes or so, that on a new request it's going WAY slower than the requests afterwards.
When there are no instances of your Lambda running and a new requests is coming in, the AWS Lambda service needs to instantiate a "fresh" instance of your Lambda.
You could read this blog which touches this topic:
https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/

Related

AWS Lambda is not invoked first time in the day when it is called but later invocations work

I have two lambda functions in my AWS. One acts as a custom authorizer and the other acts as a notification service which calls the firebase FCM notification service.
When a request is made first time in a day to the notification lambda there is no response. The lambda does not work and hence does not call the firebase service.
It seemed like a cold start problem to me so I added the provisioned concurrency for both auth and notification lambda to 1 in the hope that it will work. But the problem persists.
Cloudwatch logs are of no help at all since nothing gets printed to it which I can use to figure out the issue. Either the authorizer lambda goes cold and does not response or the primary notification lambda goes cold and does not response or even both of them have issues.
After the first call to lambda fails any subsequent calls then work smoothly like a charm.
I do not want to install any plugin which will keep the lambda warm (not an option from the client) so is there some other way I can diagnose this problem and fix it?

lambda and fargate errors/timeouts

i have a python api that i have tried on vms, fargate, and lambda.
vms - less errors when capacity is large enough
fargate - second less errors when capacity is large enough, but when autoscaling, i get some 500 errors. looks like it doesn't autoscale quick enough.
lambda - less consistent. when there are a lot of api calls, less errors. but from cold start, it may periodically fail. i do not pre-provision. when i do, i get less errors too.
i read on the below post, cold start for lambda is less than 1 sec? seems like it's more. one caveat is that each lambda function will check for an existing "env" file. if it does not exist, it will download from s3. however this is done only when hitting the api. the lambda function is listening and responding. when you hit api, the lambda function will respond and connect, download the .env file, and process further the api call. fargate also does the same, but less errors again. any thoughts?
i can pre-provision, but it gets kind of expensive. at that point, i might as will go back to VMs with autoscaling groups, but it's less cloud native. the vms provide the fastest response by far and harder to manage.
Can AWS Lambda coldout cause API Gateway timeout(30s)?
i'm using an ALB in front of lambda and fargate. the vms simply use round robin dns.
questions:
am i doing something wrong with fargate or lambda? are they alright for apis or should i just go back to vms?
what or who maintains api connection while lambda is starting up from a cold start? can i have it retry or hold on to the connection longer?
thanks!
am i doing something wrong with fargate or lambda? are they alright for apis or should i just go back to vms?
The one thing that strikes me is downloading env from s3. Wouldn't it be easier and faster to keep your env data in SSM Parameter Store? Or perhaps, passing them as env variables to the lambda function itself.
what or who maintains api connection while lambda is starting up from a cold start? can i have it retry or hold on to the connection longer?
API gateway. Sadly you can't extend 30 s time limit. Its hard limit.
i'm using an ALB in front of lambda and fargate.
It seems to me that you have API gateway->ALB->Lambda function. Why would you need ALB in that? Usually there is no such need.
i can pre-provision, but it gets kind of expensive.
Sadly, this is the only way to minimize cold-starts.

how to reduce AWS Api gateway response time

response from an api from aws api gateway integrated to a lambda
function takes a lot more time than compared to a regular node project
on aws elasticbeanstalk
is there any way to reduce response time for aws api gateway
There's definitely more information needed to answer this question but from what you've said, your problems may be caused by the cold start time of Lambda functions. An Elastic Beanstalk stack will spin up EC2 instances (which are ready once they're spun up and until they're removed). Lambda will create instances of your handler as needed to address incoming traffic. The first time you call a Lambda, it needs to provision an environment for the function for the first time. Depending on the language used, this can take some time. Successive requests should be faster unless you wait a while (in which case the lambda needs to re-initialize).
So here's more information that would be useful in case this answer is not helpful:
How much slower is Lambda than your Elastic Beanstalk stack?
Is it slower only on the first couple requests or does it continue being slow when you keep requesting?
Is it slow every day or only occasionally?

Warming Lambda Function with Cloudwatch schedule rules

I'm trying to warm a lambda (inside VPC which access a private RDS) function with cloudwatch. The rate is 5 minutes (only for experimental) I intend to make it 35 minutes later on.
After I saw the cloudwatch logs which indicate that the function has been called (and completed which I have set up a condition if no input was given, return an API gateway response immediately), I call the function from API gateway URL.
However, I'm still getting that cold starts which it return a response in 2sec. If I do it again, I get the response in 200ms.
So my question is:
What did I do wrong? can I actually warm a lambda function with a cloudwatch schedule?
Is dropping the request immediately affects this behaviour? the db connection is not established if the request is from cloudwatch
Thanks!
****EDIT****
I tried to connect to the db before I drop the function when it's called by cloudwatch. But it doesn't change anything. The first request through API call still around 2s and the next ones are around 200ms.
****EDIT 2****
I tried to remove the schedule entirely and the cold start achieves 9s. So I guess the 2s has discounted the cold start. Is it possible that the problem lies in other services? Such as API Gateway?

Trigger RDS lambda on CloudFront access

I'm serving static JS files over from my S3 Bucket over CloudFront and I want to monitor whoever accesses them, and I don't want it to be done over CloudWatch and such, I want to log it on my own.
For every request to the CloudFront I'd like to trigger a lambda function that inserts data about the request to my MySQL RDS instance.
However, CloudFront limits Viewer Request Viewer Response triggers too much, such as 1-second timeout (which is too little to connect to MySQL), no VPC configuration to the lambda (therefore I can't even access the RDS subnet) and such.
What is the most optimal way to achieve that? Setup an API Gateway and how would I send a request to there?
The typical method to process static content (or any content) accessed from CloudFront is to enable logging and then process the log files.
To enable CloudFront Edge events, which can include processing and changing an event, look into Lambda#Edge.
Lambda#Edge
I would enable logging first and monitor the traffic for a while. When the bad actors hit your web site (CloudFront Distribution) they will generate massive traffic. This could result in some sizable bills using Lambda Edge. I would also recommend looking in Amazon WAF to help mitigate Denial of Service attacks which may help with the amount of Lambda processing.
This seems like a suboptimal strategy, since CloudFront suspends request/response processing while the trigger code is running -- the Lambda code in a Lambda#Edge trigger has to finish executing before processing of the request or response continues, hence the short timeouts.
CloudFront provides logs that are dropped multiple times per hour (depending on the traffic load) into a bucket you select, which you can capture from an S3 event notification, parse, and insert into your database.
However...
If you really need real-time capture, your best bet might be to create a second Lambda function, inside your VPC, that accepts the data structures provided to the Lambda#Edge trigger.
Then, inside the code for the viewer request or viewer response trigger, all you need to do is use the built-in AWS SDK to invoke your second Lambda function asynchronously, passing the event to it.
That way, the logging task is handed off, you don't wait for a response, and the CloudFront processing can continue.
I would suggest that if you really want to take this route, this will be the best alternative. One Lambda function can easily invoke a second one, even if the second function is not in the same account, region, or VPC, because the invocation is done by communicating with the Lambda service's endpoint API.
But, there's still room for some optimization, because you have to take another aspect of Lambda#Edge into account, and it's indirectly related to this:
no VPC configuration to the lambda
There's an important reason for this. Your Lambda#Edge trigger code is run in the region closest to the edge location that is handling traffic for each specific viewer. Your Lambda#Edge function is provisioned in us-east-1, but it's then replicated to all the regions, ready to run if CloudFront needs it.
So, when you are calling that 2nd Lambda function mentioned above, you'll actually be reaching out to the Lambda API in the 2nd function's region -- from whichever region is handling the Lambda#Edge trigger for this particular request.
This means the delay will be more, the further apart the two regions are.
This your truly optimal solution (for performance purposes) is slightly more complex: instead of the L#E function invoking the 2nd Lambda function asynchronously, by making a request to the Lambda API... you can create one SNS topic in each region, and subscribe the 2nd Lambda function to each of them. (SNS can invoke Lambda functions across regional boundaries.) Then, your Lambda#Edge trigger code simply publishes a message to the SNS topic in its own region, which will immediately return a response and asynchronously invoke the remote Lambda function (the 2nd function, which is in your VPC in one specific region). Within your Lambda#Edge code, the environment variable process.env.AWS_REGION gives you the region where you are currently running, so you can use this to identify how to send the message to the correct SNS topic, with minimal latency. (When testing, this is always us-east-1).
Yes, it's a bit convoluted, but it seems like the way to accomplish what you are trying to do without imposing substantial latency on request processing -- Lambda#Edge hands off the information as quickly as possible to another service that will assume responsibility for actually generating the log message in the database.
Lambda and relational databases pose a serious challenge around concurrency, connections and connection pooling. See this Lambda databases guide for more information.
I recommend using Lambda#Edge to talk to a service built for higher concurrency as the first step of recording access. For example you could have your Lambda#Edge function write access records to SQS, and then have a background worker read from SQS to RDS.
Here's an example of Lambda#Edge interacting with STS to read some config. It could easily be refactored to write to SQS.