AWS Lambda for mobile app and throttling - amazon-web-services

According to the docs, "by default, AWS Lambda limits the total concurrent executions across all functions within a given region to 100."
Consider a simple mobile app using Lambda for back end processing. If I'm understanding the constraint correctly, not more than 100 concurrent executions can happen at one time meaning that if I have 100 users invoking lambda functions at the same time, there will be throttling constraints?
I understand I can call customer support and increase that limit but is this the correct interpretation of the constraint? How is this supposed to scale to 1000, 10,000 or 1,000,000 users?

update: Since this answer was written, the default limit for concurrent executions was increased by a factor of 10, from 100 to 1,000. The limit is per account, per region.
By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 1000
http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html#concurrent-execution-safety-limit (link visited 2017-05-02)
However, as before, this is a protective control, and AWS support will increase the limit if you present them with your use case and it is approved. There isn't a charge for creating this type of request in the support center and there isn't a charge for raising your limits.
The Lambda platform also may allow excursions beyond your limit if it deems the action appropriate. The logic behind such an action isn't documented, but a reasonable assumption would be that if the traffic appears to be genuine demand/load driven, rather than a result of a runaway loopback condition where Lambda functions invoke more Lambda functions, directly or indirectly.
A fun example of a runaway condition might be something like this: A bucket has a create object event that invokes a Lambda function, which creates 2 objects in the same bucket... which invokes the same Lambda function 4 times, creating 8 objects... invoking the lambda function 8 times, creating 16 objects.
On about the 15th iteration, which would only require a matter of seconds, you theoretically would have 32,768 concurrent invocations trying to create 65,536 objects. Real world traffic ramps up much more slowly, in most cases.
if I have 100 users invoking lambda functions at the same time, there will be throttling constraints
Yes, that's the idea behind "concurrent."
How is this supposed to scale
Nobody said it would, with the limit in place.
This limit is a protective control, not a reflection of an actual limitation of the platform.
But also, how likely is it that your users are making concurrent requests to Lambda? Assuming your Lambda function runs for 100ms, you could handle something like 750 invocations per second within a limit of 100 concurrent invocations at a blocking probability of only 0.1%.
(That's an Erlang B calculation, which seems applicable here. With no random arrivals, of course, the "pure" capacity would be 100 × 10 = 1000 invocations/sec for a 100ms function).

Related

How to tell how often a Lambda is reused?

I know one shouldn't rely on a Lambda being reused, and that's not my goal. Just trying to get an idea of how many invocations I'm getting handled by the same instance of a Lambda.
Looking at the graphs below, it shows that at some point in time, there were 5,093 invocations and 57 concurrent executions at that same point in time.
Question
Can I assume all of those invocations were handled by those concurrent executions? Thus, 5093 / 57 = ~89 requests handled by each lambda instance on average?
From the Lambda docs:
Invocations – The number of times your function code is executed, including successful executions and executions that result in a function error. Invocations aren't recorded if the invocation request is throttled or otherwise resulted in an invocation error. This equals the number of requests billed.
and
ConcurrentExecutions – The number of function instances that are processing events. If this number reaches your concurrent executions quota for the Region, or the reserved concurrency limit that you configured on the function, additional invocation requests are throttled.
Based on this I think your interpretation is correct, you can divide Invocations by ConcurrentExecutions to get the number of requests each execution context (instance) has handled on average. Note that there might be a lot of variance in the numbers, which you can't measure based on the available metrics. You'd have to generate your own metrics for that.

How good is lambda functions to hit a REST API after a fixed amount of time?

I am using distributed scheduler 'Chronos'(distributed crontab) to hit a REST API after few minute of job addition(example: Add job at time T to schedule it at T+5minutes).This run on a bigger infrastructure and take care of fault-tolerant and no-data loss, however it has significant cost and I am thinking some alternative to the similar requirement. Please help if it can be done using a lambda function.
Its possible to do invoke a lambda function, block/wait for X seconds and continue execution, but not recommended. You cannot wait for more than 300 seconds though as thats the max timeout legally allowed by Lambda functions.
Moreover, you will hit concurrent execution limits from AWS and will need to keep calling AWS support to increase your concurrent execution limits.
Another approach to solve this problem could be to use Actor based system such as Akka, to create an Actor for each job and do the needful.

AWS Lambda execution duration randomly spikes and causes time-outs

I'm building a server-less web-tracking system which serves its tracking pixel using AWS API Gateway, which calls a Lambda function whenever a tracking request arrives to write the tracking event into a Kinesis stream.
The Lambda function itself does not do anything fancy. It just a takes the incoming event (its own argument) and writes it to the stream. Essentially, it's just:
import boto3
kinesis_client = boto3.client("kinesis")
kinesis_stream = "my_stream_name"
def return_tracking_pixel(event, context):
...
new_record = ...(event)
kinesis_client.put_record(
StreamName=kinesis_stream,
Data=new_record,
PartitionKey=...
)
return ...
Sometimes I experience a weird spike in the Lambda execution duration that causes some of my Lambda function invocations to time-out and the tracking requests to be lost.
This is the graph of 1-minute invocation counts of the Lambda function in the in affected time period:
Between 20:50 and 23:10 I suddenly see many invocation errors (1-minute error counts):
which are obviously caused by the Lambda execution time-out (maximum duration in 1-minute intervals):
There is nothing weird going on neither with my Kinesis stream (data-in, number of put records, put_record success count etc., all looks normal), nor with my API GW (number of invocations corresponds to number of API GW calls, well within the limits of the API GW).
What could be causing the sudden (and seemingly randomly occurring) spike in the Lambda function execution duration?
EDIT: neither the lambda functions are being throttled, which was my first idea.
Just to add my 2 cents, because there's not much investigative work without extra logging or some X-Ray analysis.
AWS Lambda sometimes will force recycle containers which will feel like cold starts even though your function is being reasonably exercised and warmed up. This might bring all cold start related issues, like extra delays for ENIs if your Lambda has an attached VPC and so on... but even for a simple function like yours, 1 second timeout is sometimes too optimistic for a cold start.
I don't know of any documentation on those forced recycles, other than some people having evidence for it.
"We see a forced recycle about 7 times a day." source
"It also appears that even once warmed, high concurrency functions get recycled much faster than those with just a few in memory." source
I wonder how you could confirm this is the case. Perhaps you could check those errors appearing in Cloud Watch log streams to be from containers that never appeared before.

AWS Lambda faster process way

Currently, I'm implementing a solution based on S3, Lambda and DynamoDB.
My use case is, when a new object is uploaded on S3, a first Lambda function is called, downloads the new file, splits it in around 100(or more) parts and for each of them, adds additional information. Next step, each part will be processed by second Lambda function and in some case an insert will be performed in DynamoDB.
My question is only about the best way to call the "second lambda". I mean, the faster way. I want to execute 100 Lambda function(if I'd 100 parts to process) at the same time.
I know there are different possibilities:
1) My first Lambda function can push each part as an item in a Kinesis stream and my second Lambda function will react, retrieve an item and processed it. In this case I don't know if AWS will launch a new Lambda function each time there is a remaining item in the stream. Maybe there is some limitation...
2) My first Lambda function can push each part in an SNS topic and then my second Lambda will react to each new message. In this case I've some doubts about the latency(time between the action to send a message through the SNS topic and the time to my second Lambda function to be executed).
3) My first Lambda function can launch directly the second one by performing an API call and by passing the information. In this case I have no idea if I can launch 100 Lambdas function at the same time. I think I'll be stuck by a rate limitation against the AWS API(I said, I think!)
Somebody have a feedback and maybe advises regarding my use case? One more time, the most important for me it's to have the faster process way.
Thanks
Lambda limits are in place to provide some sane defaults but many workloads quickly exceed them. You can request an increase so this will not be a bottleneck for your use case. This document describes the process:
http://docs.aws.amazon.com/lambda/latest/dg/limits.html
I'm not sure how much latency your use case can tolerate but I often use SNS to fan out and the latency is usually sub-second to the next invocation (unless it's Java/coldstart).
If latency is extremely sensitive then you'd probably want to invoke Lambdas directly using Invoke with the InvocationType set to "Event". This would minimize blocking while you Invoke 100 times. You could also thread these Invoke calls within your main Lambda function to further increase parallelism if you want to hyper-optimize.
Cold containers will occasionally cause latency in your invocations. If milliseconds count this can become tricky. People who are trying to hyper-optimize Lambda processing times will sometimes schedule executions of their Lambda function with a "heartbeat" event that returns immediately (so processing time is cheap). These containers will remain "warm" for a small period of time which allows them to pick up your events without incurring "cold startup" time. Java containers are much slower to spin up cold than Node containers (I assume Python is probably equally fast as Node though I haven't tested).

Is it possible to detect an AWS account is nearing the Lambda concurrency limit?

Lambda has some concurrency limits that when hit, cause subsequent invocations to get throttled.
This makes sense, but is it possible to detect this situation ahead of time and start applying backpressure?
The problem is that (according to the docs) the concurrency limit is per-account, which means a single runaway microservice can block ALL unrelated services.
For example: a lambda fn with an s3 event source could easily lead to API Gateway handlers being throttled and unhappy API users.
Is there any QoS for lambda functions? It'd be great to be able to give public-facing functions priority. (I know the answer is no, but I wish there were.)
Short of that, is it possible to detect that you're nearing this concurrency limit and build backpressure in?
I'm not seeing anything, and the only solution I can think of at this moment is to create a metric that watches for Throttles and as soon as one happens, toggle some flag somewhere? This adds significant complexity though...
Any ideas?