Does AWS-Lamda support multi-threading? - amazon-web-services

I am writing an AWS-lambda function that reads past 1-minute data from a DB, converts it to JSON, and then pushes it to a Kafka topic.
The lambda will run every 1 minute.
So if a scenario comes like this:
at t1, a lambda process is invoked let's say P1, it is reading data from t1 to t1-1.
at t2, if P1 is not finished then will a new lambda process will be invoked or we will wait for P1 to finish so that we can invoke another process?
I understand that lambda support up to 1000 parallel processes in a region but in this situation, the lambda function will already be running in a process.

Lambda does support multi-threading and multi-process in the same execution (see an example).
However, from the context that you gave, that doesn't seem to be the underlying question.
Concurrency allows you to configure how many instances of a single Lambda function can be in execution at a given time. Quoting the relevant part of the documentation:
Concurrency is the number of requests that your function is serving at any given time. When your function is invoked, Lambda allocates an instance of it to process the event. When the function code finishes running, it can handle another request. If the function is invoked again while a request is still being processed, another instance is allocated, which increases the function's concurrency. The total concurrency for all of the functions in your account is subject to a per-region quota.
Triggers define when a Lambda is executed. If you have multiple events coming (from SQS for example) to a lambda that has concurrency>1, then it's likely that there will be multiples instances of that given Lambda running at the same time.
With concurrency=1, if you trigger the Lambda every 1 minute and it takes more than 1 minute to execute and finish, then your processing will lag behind. In other words, future Lambdas will be processing t-2, t-3, and so on.
With concurrency=1, if you want something to be processed every 1 minute, you have to make sure it doesn't take more than 1 minute to process it. With extra concurrency it can take longer.

Related

Recursive AWS lambda with updating state

I have an AWS Lambda that polls from an external server for new events every 6 hours. On every call, if there are any new events, it publishes the updated total number of events polled to a SNS. So I essentially need to call the lambda on fixed intervals but also pass a counter state across calls.
I'm currently considering the following options:
Store the counter somewhere on a EFS/S3, but it seems an
overkill for a simple number
EventBridge, which would be ok to schedule the execution, but doesn't store state across calls
A step function with a loop + wait on the the lambda would do it, but it doesn't seem to be the most efficient/cost effective way to do it
use a SQS with a delay so that the lambda essentially
triggers itself, passing the updated state. Again I don't think
this is the most effective, and to actually get to the 6 hours delay
I would have to implement some checks/delays within the lambda, as the max delay for SQS is 15 minutes
What would be the best way to do it?
For scheduling Lambda at intervals, you can use CloudWatch Events. Scheduling Lambda using Serverless framework is a breeze. A cronjob type statement can schedule your lambda call. Here's a guide on scheduling: https://www.serverless.com/framework/docs/providers/aws/events/schedule
As for saving data, you can use AWS Systems Manager Parameter Store. It's a simple Key value pair storate for such small amount of data.
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
OR you can also save it in DynamoDB. Since the data is small and frequency is less, you wont be charged much and there's no hassle of reading files or parsing.

What happens if timeout handler is not cancelled inside lambda function?

I have a lambda function that sets timeout handler with a certain delay (60 seconds) at the beginning.
I 'd like to know what is the exact behavior of lambda when the timeout handler is not cancelled till the lambda returns response (in less than 60 seconds). Especially, when there are hundreds of lambda invocation, the uncancelled timeout handler in the previous lambda execution will affect the next process that runs on the same instance? More info - lambda function is invoked asynchronously.
You haven't mentioned which language you're using or provided any code indicating how you're creating timeouts, but the general process is described at AWS Lambda execution environment.
Lambda freezes the execution environment following an invocation and it remains frozen, up to a certain maximum amount of time (15 mins afaik), and is thawed if a new invocation happens quickly enough, and the prior execution environment is re-used.
A key quote from the documentation is:
Background processes or callbacks that were initiated by your Lambda function and did not complete when the function ended [will] resume if Lambda reuses the execution environment. Make sure that any background processes or callbacks in your code are complete before the code exits.
As you wrote in the comments, the lambda is written in python.
This simple example shows that the event is passing to the next invocation:
The code:
import json
import signal
import random
def delayed(val):
print("Delayed:", val)
def lambda_handler(event, context):
r = random.random()
print("Generated", r)
signal.signal(signal.SIGALRM, lambda *args: delayed(r))
signal.setitimer(signal.ITIMER_REAL, 1)
return {'statusCode': 200}
Yields:
Cloudwatch logs
Think about the way that AWS implements lambdas:
When a lambda is being invoked, a container is being raised and the environment starts to initialize (this is the cold-start phase).
During this initialization, the python interpreter is starting, and behind the scene, an AWS code fetches events from the lambda service and triggers your handler.
This initialization is costly, so AWS prefers to wait with the same "process" for the next event. On the happy flow, it arrives "fast enough" after the previous finished, so they spared the initialization and everyone is happy.
Otherwise, after a small period, they will shutdown the container.
As long as the interpreter is still on - the signal that we fired in one invocation will leak to the next invocation.
Note also the concurrency of the lambdas - two invocations that run in parallel are running on different containers, thus have different interpreters and this alarm will not leak.

What happens when a lambda dies?

I am new to AWS so I am not sure what the behavior is when the following situation occurs.
Let's say I have a Kinesis stream with JSON data (and let's say every couple of min a few thousand messages gets inserted).
Now there is a Lambda function that gets invoked everytime a new msg is inserted into the Kinesis which reads the msg and does some processing before inserting into Redshift.
So what happens if there is some error and the Lambda function crashes while doing the processing and takes a few minutes or even a couple of hours(i don't know if that's even possible) to come back up. Will it continue reading the Kinesis from the last unread message or will it read from the latest inserted messages (as that is the invoking event).
Thanks in advance.
Lambda function crashes while doing the processing
This is possible.
and takes a few minutes or even a couple of hours(i don't know if that's even possible) to come back up.
This is not exactly possible.
A Lambda function is only allowed to run until it returns a response, throws an error, or the timeout timer fires, whichever comes first. It would never be a couple of hours.
Lambda will create a new container every time the function is invoked, unless it already has one standing by for you or you are hitting a concurrency limit (typically 1000+).
However... for Kinesis streams, what happens is a bit different because of the need for in-order processing.
Poll-based (or pull model) event sources that are stream-based: These consist of Kinesis Data Streams or DynamoDB. When a Lambda function invocation fails, AWS Lambda attempts to process the erring batch of records until the time the data expires, which can be up to seven days.
The exception is treated as blocking, and AWS Lambda will not read any new records from the shard until the failed batch of records either expires or is processed successfully. This ensures that AWS Lambda processes the stream events in order.
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
So your Lambda function throwing an exception or running past its timeout will simply cause the Lambda service to destroy the container immediately and create a new one immediately and then retry the invocation with the exact same data again until the data expires (as dictated by Kinesis config).
The delay would typically be no longer than your timeout, or the time it takes for the exception to occur, plus some number of milliseconds (up to a few seconds, for a cold start). The timeout is individually configurable on your Lambda function itself, up to 15 minutes (but this max is probably much too long).
It's potentially important to remember a somewhat hidden detail here -- there is a system that is part of the Lambda service that is reading your Kinesis stream and then telling another part of the Lambda service to invoke your function, with the batch of records. The Lambda service (not your Lambda function) is checking the stream by pulling data -- the stream is not technically pushing data to Lambda. DynamoDB streams and SQS work similarly -- Lambda pulls data, and handles retries by re-invoking the function. The other service is not responsible for pushing data.

Could AWS Scheduled Events potentially overlap?

I would like to create a Scheduled Events to run a Lambda to execute an api call every 1 minute (cron-line behaviour).
The caveat to this setup is that; the external api is un-reliable / slow and the api call sometimes could last longer than 1 minute.
So, my question here is; given the setup & scenario - would AWS run another Scheduled Event and execute the lambda before the previous executing finished? I.e. overlap?
If it does; is there a way to configure the scheduled event to not "overlap"?
I did some initial research into this and came across this article:
https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
It looks like you can set concurrency limits at function level? Is this the way to achieve non-overlapping scheduled lambda executions? i.e. set the function's concurrency limit to 1?
Yes by default it will execute your Lambda function every 1 minute, regardless if the previous invocation has completed or not.
To enforce no more than one running instance of your Lambda function at a time, set the Concurrency setting of your Lambda function to 1.

AWS Lambda faster process way

Currently, I'm implementing a solution based on S3, Lambda and DynamoDB.
My use case is, when a new object is uploaded on S3, a first Lambda function is called, downloads the new file, splits it in around 100(or more) parts and for each of them, adds additional information. Next step, each part will be processed by second Lambda function and in some case an insert will be performed in DynamoDB.
My question is only about the best way to call the "second lambda". I mean, the faster way. I want to execute 100 Lambda function(if I'd 100 parts to process) at the same time.
I know there are different possibilities:
1) My first Lambda function can push each part as an item in a Kinesis stream and my second Lambda function will react, retrieve an item and processed it. In this case I don't know if AWS will launch a new Lambda function each time there is a remaining item in the stream. Maybe there is some limitation...
2) My first Lambda function can push each part in an SNS topic and then my second Lambda will react to each new message. In this case I've some doubts about the latency(time between the action to send a message through the SNS topic and the time to my second Lambda function to be executed).
3) My first Lambda function can launch directly the second one by performing an API call and by passing the information. In this case I have no idea if I can launch 100 Lambdas function at the same time. I think I'll be stuck by a rate limitation against the AWS API(I said, I think!)
Somebody have a feedback and maybe advises regarding my use case? One more time, the most important for me it's to have the faster process way.
Thanks
Lambda limits are in place to provide some sane defaults but many workloads quickly exceed them. You can request an increase so this will not be a bottleneck for your use case. This document describes the process:
http://docs.aws.amazon.com/lambda/latest/dg/limits.html
I'm not sure how much latency your use case can tolerate but I often use SNS to fan out and the latency is usually sub-second to the next invocation (unless it's Java/coldstart).
If latency is extremely sensitive then you'd probably want to invoke Lambdas directly using Invoke with the InvocationType set to "Event". This would minimize blocking while you Invoke 100 times. You could also thread these Invoke calls within your main Lambda function to further increase parallelism if you want to hyper-optimize.
Cold containers will occasionally cause latency in your invocations. If milliseconds count this can become tricky. People who are trying to hyper-optimize Lambda processing times will sometimes schedule executions of their Lambda function with a "heartbeat" event that returns immediately (so processing time is cheap). These containers will remain "warm" for a small period of time which allows them to pick up your events without incurring "cold startup" time. Java containers are much slower to spin up cold than Node containers (I assume Python is probably equally fast as Node though I haven't tested).