Using a SAM template to deploy an apigateway that calls a couple of lambdas.
All lambdas are written in java11 and are connected to a VPC. This adds a lot of delay between calling the function and the code actually starting up (about 30 seconds).
Used
AutoPublishAlias: "latestAlias"
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions:
1
to enable Provisioned Concurrency but i still get long starts of 30 seconds from time to time.
Also tried using ping events via AWS::Events::Rule to call the lambdas every 9 minutes to keep them warn (i recall reading that 15 minutes of inactivity leads to lambda containers being destroyed). Also left Provisioned Concurrency enabled while doing this, and I still see cold starts. I also see new files being created in CloudWatch logs, indicating that a new container has been spun up.
Using xray to validate that i don't have multiple requests at the same time that might trigger the creating of a new container.
Am I missing something in enabling this? From the documentation i understand that "initialization" should not appear in xray once the lambda has been started. Also, AWS Hyperplane was supposed to reduce lambda-VPC integration times to around 1 second (deploying on eu-central-1 which should have Hyperplane for a while now).
Related
I am new to AWS and having some difficulty tracking down and resolving some latency we are seeing on our API. Looking for some help diagnosing and resolving the issue.
Here is what we are seeing:
If an endpoint hasn't been hit recently, then on the first request we see a 1.5-2s delay marked as "Initialization" in the CloudWatch Trace.
I do not believe this is a cold start, because each endpoint is configured to have 1 provisioned concurrency, so we should not get a cold start unless there are 2 simultaneous requests. Also, the billed duration includes this initialization period.
Cold start means when your first request hit to aws lambda it will be prepared container to run your scripts,this will take some time and your request will delay.
When second request hit lambda and lambda and container is already up and runing will be process quickly
https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/
This is the default behavior of cold start, but since you said that you were using provisioned concurrency, that shouldn't happen.
Provisioned concurrency has a delay to activate in the account, you can follow this steps to verify if this lambda used on demand or provisioned concurrency.
AWS Lambda sets an ENV called AWS_LAMBDA_INITIALIZATION_TYPE that contain the values on_demand or provisioned_concurrency.
I have some ECS tasks running in AWS Fargate which in very rare cases may "die" internally, but will still show as "RUNNING" and not fail and trigger the task to restart.
What I would like to do, if possible is check for the absence of logs, e.g. if logs haven't been written in 30 minutes, trigger a lambda to kill the ECS task which will cause it to start back up.
The health check functionality isn't sufficient.
If this isn't possible, are there any other approaches I could consider?
you can have metric and anomaly detection but it may cost for metric to process logs + alarm may cost too. Would rather do lambda run every 30min which would check if logs are there and then would kill ECS as needed. you can run lambda on interval with cloudwatch events bridge.
Logs are probably sent to cloudwatch logs group from your ECS, if you have static name of the logs group, you can use SDK to describe streams inside the group. This api call will tell you timestamp of the last data in stream.
inside lambda nodejs context aws-sdk v2 is already present, so you can require w/o install. here is doc for v2:
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatchLogs.html#describeLogStreams-property
pick to orderBy: "LastEventTime" and to save networking time, set limit from default 50 to 1 limit: 1 and in result you will have lastEventTimestamp
anomaly detection:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html
alarms:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
check pricing for these, there is free tier, so maybe it won't cost you anything, yet it's easy to build up real $ spend with cloudwatch. https://aws.amazon.com/cloudwatch/pricing/
To run lambda on interval:
I have enabled provisioned concurrency (5 units) on one of my lambda functions which is invoked via API Gateway requests.
My impression of this was that it eliminates "cold starts" or "inits" altogether yet when I hit my appropriate Api endpoint after an "idle period" X-Ray & Cloudwatch show me what I would have thought is a cold start?
Init Duration: 2379.57 ms
Subsequent requests don't have this and my total time for the request goes from approx 3.8s to approx 200ms.
My understanding was that provisioning the concurrency would automatically hit the provisioned ones first and thus ALWAYS produce that latter 200ms scenario (assuming we are not exceeding concurrency count). I've created a separate lambda and am the only one using it so I know that is not the issue. I've created an alias, which point to to a version, not $LATEST.
Lambda metrics display "Invocations" "ProvisionedConcurrencyInvocations" with the appropriate colour on the chart which indicated provisioned concurrency.
Why am I getting what is seemingly a cold start still?
If all the above seems OK and my understanding is indeed what should be happening then would it be related to the services used in the lambda itself?
I use S3, SQS and DynamoDB.
Lambda is a c# .net core 3.1 function
Edit #1 -
I Should clarify that the function is actually an asp.net core 3.1 web api serverless and has all the usual attached ConfigureServices/Configure points as well as Controller classes.
I have timed and written to console the times it takes to ConfigureServices/Configure and Constructor of the controller. Times are .6s/0.0s/0.0s at first run.
I have enabled XRay at the SDK level in Startup.cs with:
AWSSDKHandler.RegisterXRayForAllServices();
I can now see the following breakdown in X-Ray:
And after a few (quick) executions I see:
I want to have EVERY single execution sitting at that 163ms and bypass the nearly 4s entirely.
I am invoking an AWS Lambda function locally using aws-sam cli command and I have set the Timeout property to 900 seconds but still it shows function timeout error. However, when I was invoking this function in lambda handler in AWS Console these 900 seconds were enough for the inferencing.
Please help me figure out a solution for this issue and what is the maximum limit I can go for Timeout?
AWS Lambda functions (as at July 2021) can only run for a maximum of 15 minutes (which is 900 seconds).
Some people do 'interesting' things like:
Call another Lambda function to continue the work, or
Use AWS Step Functions to orchestrate multiple AWS Lambda functions
However, it would appear that your use-case is Machine Learning, which does not like to have operations stopped in the middle of processing. Therefore, AWS Lambda is not suitable for your use-case.
Instead, I would recommend using Amazon EC2 spot instances, which will likely be lower-cost for your use-case. While spot instances might occasionally be terminated, your use-case can probably handle the need to re-run some processing if this happens.
I was trying to set up lambda with provisioned concurrency. I enabled this feature for the latest version of my lambda function.
After that, i ran this function and watched logs in AWS X-RAY. I saw that my function, still has initialization, but it should become warm with provisioned concurrency.
Without latency, after first start, i ran it twice and it became warm as expected (because it is a default behaviour when lambda became warm after first start without provisioned concurrency).
I waited 15 minutes and was trying to start my lambda again and still it still has initialization time in the logs. It doesn't become warm with provisioned concurrency as expected and always has initialization time.
How can i resolve it?