I have a Lambda function (initiated by API Gateway) that accesses an Aurora Cluster in private subnets (no errors in CloudWatch, just that the function timed out).
If I invoke the function several times, at around 5 concurrent executions it is timing out and returning 502 errors from API Gateway.
I know there can be issues with cold start times when accessing a VPC, due to ENI's being created, and I need to make sure to have enough IP's for the ENI's to be created. But using the formula from AWS (my function has 512MB, and the private subnets has a /24 range, meaning 254 usable IP's):
IP's for ENI's = Projected peak concurrent executions * (Memory in GB / 3GB)
254 = Projected peak concurrent executions * (512/3000)
Projected peak concurrent executions = 1400+
Way higher than ~5. Have I missed something? Do I somehow need to manually create the ENI's, or is making sure I have enough available IP's enough?
Any guidance would be appreciated.
Related
We have a service built in AWS which only gets traffic for few minutes in entire day and then there is no traffic at all. During the burst, say, we get traffic at 200 TPS otherwise, traffic is almost zero during the entire day. This dynamodb has auto scaling enabled.
The thing I wanted to know is how should we set minWCU and minWCU for it. Should it be determined by the most traffic we expected to traffic or the minimum traffic we receive? If I do minimum traffic, say 10, and set utilization as 50%, then I see that some events gets throttled since autoscaling takes time to increase capacity units. But setting the min capacity units according to most traffic that we receive increases the cost of dynamodb, in which case we are incurring cost even when we are not using the dynamodb at all. So, are there any best practices regarding this case?
For your situation, you might be better going with on-demand mode.
DynamoDB on-demand offers pay-per-request pricing for read and write requests so that you pay only for what you use.
This frees you from managing RCUs, WCUs, and autoscaling. There would be no need for pro-active scaling
Be sure to review the considerations before making that change
If you do not have consistent traffic then its better to set to close to what the burst is, as it takes 5 minutes before scaling up you might find your credits depleted before it scales.
I have a Serverless .net core web api lambda application deployed on AWS.
I have this sitting inside a VPC as I access ElasticSearch service inside that same VPC.
I have two API microservices that connect to the Elasticsearch service.
After a period of non use (4 hours, 6 hours, 18 hours - I'm not sure exactly but seems random), the function becomes unresponsive and I get a 504 gateway timeout error, "Endpoint cannot be found"
I read somewhere that if "idle" for too long, the ENI is released back into the AWS system and that triggering the Lambda again should start it up.
I can't seem to "wake" up the function by calling it as it keeps timing out with the above error (I have also increased the timeouts from default).
Here's the kicker - If I make any changes to the specific lambda function, and save those changes (this includes something as simple as changing the timeout value) - My API calls (BOTH of them, even though different lambdas) start working again like it has "kicked" back in. Obviously the changes do this, but why?
Obviously I don't want timeouts in a production environment regardless of how much, OR how little the lambda or API call is used.
I need a bulletproof solution to this. Surely it's a config issue of some description but I'm just not sure where to look.
I have altered Route tables, public/private subnets, CIDR blocks, created internet gateways, NAT etc. for the VPC. This all works, but these two lambdas, that require VPC access, keeps falling "asleep" somehow.
The is because of Cold Start of Lambda.
There is a new feature which was release in reInvent 2019, where in there is a provisioned concurrency for lambda (don't get confused with reserved concurrency).
Ensure the provisioned concurrency to minimum 1 (or the amount of requests to be served in parallel) to have lambda warm always and serve requests
Ref: https://aws.amazon.com/blogs/aws/new-provisioned-concurrency-for-lambda-functions/
To get more context, Lambda in VPC uses hyperplane ENI and functions in the same account that share the same security group:subnet pairing use the same network interfaces.
If Lambda functions in an account go idle for sometime (typically no usage for 40 mins across all functions using that ENI, as I got this time info from AWS support), the service will reclaim the unused Hyperplane resources and so very infrequently invoked functions may still see longer cold-start times.
Ref: https://aws.amazon.com/blogs/compute/announcing-improved-vpc-networking-for-aws-lambda-functions/
I understand the AWS Lambda is a serverless concept wherein a piece of code can be triggered on some event.
I want to understand how does the Lambda handle scaling?
For eg. if my Lambda function sits inside a VPC subnet as it wants to access VPC resources, and that the subnet has a CIDR of 192.168.1.0/24, which would result in 251 available IPs after subtracting the AWS reserved 5 IPs
Would that mean if my AWS Lambda function gets 252 invocations at the exact same time,Only 251 of the requests would be served and 1 would either timeout or will get executed once one of the 252 functions completes execution?
Does the Subnet size matter for the AWS Lambda scaling?
I am following this reference doc which mentions concurrent execution limits per region,
Can I assume that irrespective of whether an AWS Lambda function is No VPC or if it's inside a VPC subnet, it will scale as per mentioned limits in the doc?
Vladyslav's answer is still technically correct (Subnet size does matter), but things have changed significantly since it was written and subnet size is much less of a consideration. See aws' announcement:
Because the network interfaces are shared across execution environments, typically only a handful of network interfaces are required per function. Every unique security group:subnet combination across functions in your account requires a distinct network interface. If a combination is shared across multiple functions in your account, we reuse the same network interface across functions.
Your function scaling is no longer directly tied to the number of network interfaces and Hyperplane ENIs can scale to support large numbers of concurrent function executions
Yes, you are right. Subnet size definitely does matter, you have to be careful with your CIDR blocks. With that one last invocation (252nd), it depends on the way your lambda is invoked: synchronously (e.g. API Gateway) or asynchronously (e.g. SQS). If it is called synchronously, it'll be just throttled and your API will respond with 429 HTTP status, which stands for "too many requests". If it is asynchronous, it'll be throttled and will be retried within a six hour period window. More detailed description you can find on this page.
Also I recently published a post in my blog, which is related to your question. You may find it useful.
I have an AWS Lambda Function setup with a trigger from a SQS queue. Current the queue has about 1.3m messages available. According to CloudWatch the Lambda function has only ever reached 431 invocations in a given minute. I have read that Lambda supports 1000 concurrent functions running at a time, so I'm not sure why it would be maxing out at 431 in a given minute. As well it looks like my function only runs for about 5.55s or so on average, so each one of those 1000 available concurrent slots should be turning over multiple times per minute, therefor giving a much higher rate of invocations.
How can I figure out what is going on here and get my Lambda function to process through that SQS queue in a more timely manner?
The 1000 concurrent connection limit you mention assumes that you have provided enough capacity.
Take a look at this, particularly the last bit.
https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
If your Lambda function accesses a VPC, you must make sure that your
VPC has sufficient ENI capacity to support the scale requirements of
your Lambda function. You can use the following formula to
approximately determine the ENI capacity.
Projected peak concurrent executions * (Memory in GB / 3GB)
Where:
Projected peak concurrent execution – Use the information in Managing Concurrency to determine this value.
Memory – The amount of memory you configured for your Lambda function.
The subnets you specify should have sufficient available IP addresses
to match the number of ENIs.
We also recommend that you specify at least one subnet in each
Availability Zone in your Lambda function configuration. By specifying
subnets in each of the Availability Zones, your Lambda function can
run in another Availability Zone if one goes down or runs out of IP
addresses.
Also read this article which points out many things that might be affecting you: https://read.iopipe.com/5-things-to-know-about-lambda-the-hidden-concerns-of-network-resources-6f863888f656
As a last note, make sure your SQS Lambda trigger has a batchSize of 10 (max available).
Good morning, Could you please help us with next problem:
I have an API Gateway + Java Lambda Handler. this Lambda uses httpconnection to get some Internet REST API.
when we use this Lambda without VPC it works fine. but when we are using VPC with configured internet access - sometimes Lambda fails with timeout errors. it fails in 20% of all requests (80% requests works fine) with next errors at log.
REPORT RequestId: 16214561-b09a-11e6-a762-7546f12e61bd Duration: 15000.26 ms Billed Duration: 15000 ms Memory Size: 512 MB Max Memory Used: 47 MB
09:57:49
2016-11-22T09:57:49.245Z 16214561-b09a-11e6-a762-7546f12e61bd Task timed out after 15.00 seconds
According to my logs lambda cannot send GET request. I'm not sure where the problem at. Is this Lambda issue, VPC issue or some cofiguration issue.
Also I did try many different REST Api endpoints, so it's definetly not an endpoint issue.
Appreciate any help.
When you place a Lambda function inside your VPC it will not have access to anything outside the VPC. To enable your Lambda function to access resources outside the VPC you have to add a NAT Gateway to your VPC.
The problem is solved.
Lambda VPC configuration had public subnet attached.
Thanks to #Michael-sqlbot
I had pretty much the same issue a few months ago, and here is my solution:
Assuming you set up your Lambda manually, in the Configuration -> Advanced settings you will find the VPC and then choose subnet and security groups.
The Subnet you selected should be in the same subnet with other services the lambda function invokes. In your case, your lambda service uses httpconnection to Internet rest API, that's fine, but you may need DB connection with RDS or triggered by SQS or SNS. So make sure the subnet is correct.
The Security Groups is more important. Again, in your case, you need the access to Internet, so ensure the security group's outbound rules has external connections. Normally, I give all ports and all destination available for simplicity, and of course, you can limit to use port 80 and the API's IP address you need.
Since the executor is "locked" behind a VPC - all internet
communications are blocked.
That results in any http(s) calls to be timed out as they request
packet never gets to the destination.
That is why all actions done by aws-sdk result in a timeout.
Please refer to https://stackoverflow.com/a/39206646
From your log,
Billed Duration: 15000 ms
Memory Size: 512 MB
Max Memory Used: 47 MB
Solution:
It is timeout issue. You need to increase the execution time 15 seconds to 30 seconds or more if necessary.
In some cases, you also need to increase the memory size. It may also make effect. But I think time is the main fact for you, not memory size.
For timing issue and testing issue, you can go through the followings:
Q: How long can an AWS Lambda function execute?
Soluiton: All calls made to AWS Lambda must complete execution within 300 seconds. The default timeout is 3 seconds, but you can set the timeout to any value between 1 and 300 seconds.
To determine why your Lambda function is not working as expected:
You can test your code locally as you would any other Node.js function, or you can test it within the Lambda console using the console's test invoke functionality, or you can use the AWS CLI Invoke command. Each time the code is executed in response to an event, it writes a log entry into the log group associated with a Lambda function, which is /aws/lambda/.
If you see a timeout exceeded error in the log, your timeout setting
exceeds the run time of your function code. This may be because the
timeout is too low, or the code is taking too long to execute.
For solution:
Test your code with different memory settings.
If your code is taking too long to execute, it could be that it does not have enough compute resources to execute its logic. Try increasing the memory allocated to your function and testing the code again, using the Lambda console's test invoke functionality. You can see the memory used, code execution time, and memory allocated in the function log entries. Changing the memory setting can change how you are charged for duration. For information about pricing, see AWS Lambda.
Resource Link:
Troubleshooting and Monitoring AWS Lambda Functions with Amazon
CloudWatch
For testing, a full code example is given here: http://qiita.com/c9katayama/items/b9a30cdfaaa91cba23ad