I have enabled provisioned concurrency (5 units) on one of my lambda functions which is invoked via API Gateway requests.
My impression of this was that it eliminates "cold starts" or "inits" altogether yet when I hit my appropriate Api endpoint after an "idle period" X-Ray & Cloudwatch show me what I would have thought is a cold start?
Init Duration: 2379.57 ms
Subsequent requests don't have this and my total time for the request goes from approx 3.8s to approx 200ms.
My understanding was that provisioning the concurrency would automatically hit the provisioned ones first and thus ALWAYS produce that latter 200ms scenario (assuming we are not exceeding concurrency count). I've created a separate lambda and am the only one using it so I know that is not the issue. I've created an alias, which point to to a version, not $LATEST.
Lambda metrics display "Invocations" "ProvisionedConcurrencyInvocations" with the appropriate colour on the chart which indicated provisioned concurrency.
Why am I getting what is seemingly a cold start still?
If all the above seems OK and my understanding is indeed what should be happening then would it be related to the services used in the lambda itself?
I use S3, SQS and DynamoDB.
Lambda is a c# .net core 3.1 function
Edit #1 -
I Should clarify that the function is actually an asp.net core 3.1 web api serverless and has all the usual attached ConfigureServices/Configure points as well as Controller classes.
I have timed and written to console the times it takes to ConfigureServices/Configure and Constructor of the controller. Times are .6s/0.0s/0.0s at first run.
I have enabled XRay at the SDK level in Startup.cs with:
AWSSDKHandler.RegisterXRayForAllServices();
I can now see the following breakdown in X-Ray:
And after a few (quick) executions I see:
I want to have EVERY single execution sitting at that 163ms and bypass the nearly 4s entirely.
Related
I am new to AWS and having some difficulty tracking down and resolving some latency we are seeing on our API. Looking for some help diagnosing and resolving the issue.
Here is what we are seeing:
If an endpoint hasn't been hit recently, then on the first request we see a 1.5-2s delay marked as "Initialization" in the CloudWatch Trace.
I do not believe this is a cold start, because each endpoint is configured to have 1 provisioned concurrency, so we should not get a cold start unless there are 2 simultaneous requests. Also, the billed duration includes this initialization period.
Cold start means when your first request hit to aws lambda it will be prepared container to run your scripts,this will take some time and your request will delay.
When second request hit lambda and lambda and container is already up and runing will be process quickly
https://aws.amazon.com/blogs/compute/operating-lambda-performance-optimization-part-1/
This is the default behavior of cold start, but since you said that you were using provisioned concurrency, that shouldn't happen.
Provisioned concurrency has a delay to activate in the account, you can follow this steps to verify if this lambda used on demand or provisioned concurrency.
AWS Lambda sets an ENV called AWS_LAMBDA_INITIALIZATION_TYPE that contain the values on_demand or provisioned_concurrency.
I have created a model endpoint which is InService and deployed on an ml.m4.xlarge instance. I am also using API Gateway to create a RESTful API.
Questions:
Is it possible to have my model endpoint only Inservice (or on standby) when I receive inference requests? Maybe by writing a lambda function or something that turns off the endpoint (so that it does not keep accumulating the per hour charges)
If q1 is possible, would this have some weird latency issues on the end users? Because it usually takes a couple of minutes for model endpoints to be created when I configure them for the first time.
If q1 is not possible, how would choosing a cheaper instance type affect the time it takes to perform inference (Say I'm only using the endpoints for an application that has a low number of users).
I am aware of this site that compares different instance types (https://aws.amazon.com/sagemaker/pricing/instance-types/)
But, does having a moderate network performance mean that the time to perform realtime inference may be longer?
Any recommendations are much appreciated. The goal is not to burn money when users are not requesting for predictions.
How large is your model? If it is under the 50 MB size limit required by AWS Lambda and the dependencies are small enough, there could be a way to rely directly on Lambda as an execution engine.
If your model is larger than 50 MB, there might still be a way to run it by storing it on EFS. See EFS for Lambda.
If you're willing to wait 5-10 minutes for SageMaker to launch, you can accomplish this by doing the following:
Set up a Lambda function (or create a method in an existing function) to check your endpoint status when the API is called. If the status != 'InService', call the function in #2.
Create another method that when called launches your endpoint and creates a metric alarm in Cloudwatch to monitor your primary lambda function's invocations. When the threshold falls below your desired invocations / period, it will call the function in #3.
Create a third method to delete your endpoint and the alarm when called. Technically, the alarm can't call a Lambda function, so you'll need to create a topic in SNS and subscribe this function to it.
Good luck!
So our project was using Hangfire to dynamically schedule tasks but keeping in mind auto scaling of server instances we decided to do away with it. I was looking for cloud native serverless solution and decided to use CloudWatch Events with Lambda. I discovered later on that there is an upper limit on the number of Rules that can be created (100 per account) and that wouldn't scale automatically. So now I'm stuck and any suggestions would be great!
As per CloudWatch Events documentation you can request a limit increase.
100 per region per account. You can request a limit increase. For
instructions, see AWS Service Limits.
Before requesting a limit increase, examine your rules. You may have
multiple rules each matching to very specific events. Consider
broadening their scope by using fewer identifiers in your Event
Patterns in CloudWatch Events. In addition, a rule can invoke several
targets each time it matches an event. Consider adding more targets to
your rules.
If you're trying to create a serverless task scheduler one possible way could be:
CloudWatch Event that triggers a lambda function every minute.
Lambda function reads a DynamoDB table and decide which actions need to be executed at that time.
Lambda function could dispatch the execution to other functions or services.
So I decided to do as Diego suggested, use CloudWatch Events to trigger a Lambda every minute which would query DynamoDB to check for the tasks that need to be executed.
I had some concerns regarding the data that would be fetched from dynamoDb (duplicate items in case of longer than 1 minute of execution), so decided to set the concurrency to 1 for that Lambda.
I also had some concerns regarding executing those tasks directly from that Lambda itself (timeouts and tasks at the end of a long list) so what I'm doing is pushing the tasks to SQS each separately and another Lambda is triggered by the SQS to execute those tasks parallely. So far results look good, I'll keep updating this thread if anything comes up.
I'm learning about AWS Lambda and I'm worried about synchronized real-time requests.
The fact the lambda has a "cold start" it doesn't sounds good for handling GET petitions.
Imagine a user is using the application and do a GET HTTP Request to get a Product or a list of Products, if the lambda is sleeping, then it will take 10 seconds to respond, I don't see this as an acceptable response time.
Is it good or bad practice to use AWS Lambda for classic (sync responses) API Rest?
Like most things, I think you should measure before deciding. A lot of AWS customers use Lambda as the back-end for their webapps quite successfully.
There's a lot of discussion out there on Lambda latency, for example:
2017-04 comparing Lambda performance using Node.js, Java, C# or Python
2018-03 Lambda call latency
2019-09 improved VPC networking for AWS Lambda
2019-10 you're thinking about cold starts all wrong
In December 2019, AWS Lambda introduced Provisioned Concurrency, which improves things. See:
2019-12 AWS Lambda announces Provisioned Concurrency
2020-09 AWS Lambda Cold Starts: Solving the Problem
You should measure latency for an environment that's representative of your app and its use.
A few things that are important factors related to request latency:
cold starts => higher latency
request patterns are important factors in cold starts
if you need to deploy in VPC (attachment of ENI => higher cold start latency)
using CloudFront --> API Gateway --> Lambda (more layers => higher latency)
choice of programming language (Java likely highest cold-start latency, Go lowest)
size of Lambda environment (more RAM => more CPU => faster)
Lambda account and concurrency limits
pre-warming strategy
Update 2019-12: see Predictable start-up times with Provisioned Concurrency.
Update 2021-08: see Increasing performance of Java AWS Lambda functions using tiered compilation.
As an AWS Lambda + API Gateway user (with Serverless Framework) I had to deal with this too.
The problem I faced:
Few requests per day per lambda (not enough to keep lambdas warm)
Time critical application (the user is on the phone, waiting for text-to-speech to answer)
How I worked around that:
The idea was to find a way to call the critical lambdas often enough that they don't get cold.
If you use the Serverless Framework, you can use the serverless-plugin-warmup plugin that does exactly that.
If not, you can copy it's behavior by creating a worker that will invoke the lambdas every few minutes to keep them warm. To do this, create a lambda that will invoke your other lambdas and schedule CloudWatch to trigger it every 5 minutes or so. Make sure to call your to-keep-warm lambdas with a custom event.source so you can exit them early without running any actual business code by putting the following code at the very beginning of the function:
if (event.source === 'just-keeping-warm) {
console.log('WarmUP - Lambda is warm!');
return callback(null, 'Lambda is warm!');
}
Depending on the number of lamdas you have to keep warm, this can be a lot of "warming" calls. AWS offers 1.000.000 free lambda calls every month though.
We have used AWS Lambda quite successfully with reasonable and acceptable response times. (REST/JSON based API + AWS Lambda + Dynamo DB Access).
The latency that we measured always had the least amount of time spent in invoking functions and large amount of time in application logic.
There are warm up techniques as mentioned in the above posts.
Very interested in getting hands-on with Serverless in 2018. Already looking to implement usage of AWS Lambda in several decentralized app projects. However, I don't yet understand how you can prevent abuse of your endpoint from a 3rd-party app (perhaps even a competitor), from driving up your usage costs.
I'm not talking about a DDoS, or where all the traffic is coming from a single IP, which can happen on any network, but specifically having a 3rd-party app's customers directly make the REST calls, which cause your usage costs to rise, because their app is piggy-backing on your "open" endpoints.
For example:
I wish to create an endpoint on AWS Lambda to give me the current price of Ethereum ETH/USD. What would prevent another (or every) dapp developer from using MY lambda endpoint and causing excessive billing charges to my account?
When you deploy an endpoint that is open to the world, you're opening it to be used, but also to be abused.
AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS, etc., however, they do not know what is or is not abuse of your Lambda function, as you are asking.
If your Lambda function is private, then you should use one of the API gateway security mechanisms to prevent abuse:
IAM security
API key security
Custom security authorization
With one of these in place, your Lambda function can only by called by authorized users. Without one of these in place, there is no way to prevent the type of abuse you're concerned about.
Unlimited access to your public Lambda functions - either by bad actors, or by bad software developed by legitimate 3rd parties, can result in unwanted usage of billable corporate resources, and can degrade application performance. It is important to you consider ways of limiting and restricting access to your Lambda clients as part of your systems security design, to prevent runaway function invocations and uncontrolled costs.
Consider using the following approach to preventing execution "abuse" of your Lambda endpoint by 3rd party apps:
One factor you want to control is concurrency, or number of concurrent requests that are supported per account and per function. You are billed per request plus total memory allocation per request, so this is the unit you want to control. To prevent run away costs, you prevent run away executions - either by bad actors, or by bad software cause by legitimate 3rd parties.
From Managing Concurrency
The unit of scale for AWS Lambda is a concurrent execution (see
Understanding Scaling Behavior for more details). However, scaling
indefinitely is not desirable in all scenarios. For example, you may
want to control your concurrency for cost reasons, or to regulate how
long it takes you to process a batch of events, or to simply match it
with a downstream resource. To assist with this, Lambda provides a
concurrent execution limit control at both the account level and the
function level.
In addition to per account and per Lambda invocation limits, you can also control Lambda exposure by wrapping Lambda calls in an AWS API Gateway, and Create and Use API Gateway Usage Plans:
After you create, test, and deploy your APIs, you can use API Gateway
usage plans to extend them as product offerings for your customers.
You can provide usage plans to allow specified customers to access
selected APIs at agreed-upon request rates and quotas that can meet
their business requirements and budget constraints.
What Is a Usage Plan? A usage plan prescribes who can access one or
more deployed API stages— and also how much and how fast the caller
can access the APIs. The plan uses an API key to identify an API
client and meters access to an API stage with the configurable
throttling and quota limits that are enforced on individual client API
keys.
The throttling prescribes the request rate limits that are applied to
each API key. The quotas are the maximum number of requests with a
given API key submitted within a specified time interval. You can
configure individual API methods to require API key authorization
based on usage plan configuration. An API stage is identified by an
API identifier and a stage name.
Using API Gateway Limits to create Gateway Usage Plans per customer, you can control API and Lambda access prevent uncontrolled account billing.
#Matt answer is correct, yet incomplete.
Adding a security layer is a necessary step towards security, but doesn't protect you from authenticated callers, as #Rodrigo's answer states.
I actually just encountered - and solved - this issue on one of my lambda, thanks to this article: https://itnext.io/the-everything-guide-to-lambda-throttling-reserved-concurrency-and-execution-limits-d64f144129e5
Basically, I added a single line on my serverless.yml file, in my function that gets called by the said authirized 3rd party:
reservedConcurrency: 1
And here goes the whole function:
refresh-cache:
handler: src/functions/refresh-cache.refreshCache
# XXX Ensures the lambda always has one slot available, and never use more than one lambda instance at once.
# Avoids GraphCMS webhooks to abuse our lambda (GCMS will trigger the webhook once per create/update/delete operation)
# This makes sure only one instance of that lambda can run at once, to avoid refreshing the cache with parallel runs
# Avoid spawning tons of API calls (most of them would timeout anyway, around 80%)
# See https://itnext.io/the-everything-guide-to-lambda-throttling-reserved-concurrency-and-execution-limits-d64f144129e5
reservedConcurrency: 1
events:
- http:
method: POST
path: /refresh-cache
cors: true
The refresh-cache lambda was invoked by a webhook triggered by a third party service when any data change. When importing a dataset, it would for instance trigger as much as 100 calls to refresh-cache. This behaviour was completely spamming my API, which in turn was running requests to other services in order to perform a cache invalidation.
Adding this single line improved the situation a lot, because only one instance of the lambda was running at once (no concurrent run), the number of calls was divided by ~10, instead of 50 calls to refresh-cache, it only triggered 3-4, and all those call worked (200 instead of 500 due to timeout issue).
Overall, pretty good. Not yet perfect for my workflow, but a step forward.
Not related, but I used https://epsagon.com/ which tremendously helped me figuring out what was happening on AWS Lambda. Here is what I got:
Before applying reservedConcurrency limit to the lambda:
You can see that most calls fail with timeout (30000ms), only the few first succeed because the lambda isn't overloaded yet.
After applying reservedConcurrency limit to the lambda:
You can see that all calls succeed, and they are much faster. No timeout.
Saves both money, and time.
Using reservedConcurrency is not the only way to deal with this issue, there are many other, as #Rodrigo stated in his answer. But it's a working one, that may fit in your workflow. It's applied on the Lambda level, not on API Gateway (if I understand the docs correctly).