What happens if timeout handler is not cancelled inside lambda function? - amazon-web-services

I have a lambda function that sets timeout handler with a certain delay (60 seconds) at the beginning.
I 'd like to know what is the exact behavior of lambda when the timeout handler is not cancelled till the lambda returns response (in less than 60 seconds). Especially, when there are hundreds of lambda invocation, the uncancelled timeout handler in the previous lambda execution will affect the next process that runs on the same instance? More info - lambda function is invoked asynchronously.

You haven't mentioned which language you're using or provided any code indicating how you're creating timeouts, but the general process is described at AWS Lambda execution environment.
Lambda freezes the execution environment following an invocation and it remains frozen, up to a certain maximum amount of time (15 mins afaik), and is thawed if a new invocation happens quickly enough, and the prior execution environment is re-used.
A key quote from the documentation is:
Background processes or callbacks that were initiated by your Lambda function and did not complete when the function ended [will] resume if Lambda reuses the execution environment. Make sure that any background processes or callbacks in your code are complete before the code exits.

As you wrote in the comments, the lambda is written in python.
This simple example shows that the event is passing to the next invocation:
The code:
import json
import signal
import random
def delayed(val):
print("Delayed:", val)
def lambda_handler(event, context):
r = random.random()
print("Generated", r)
signal.signal(signal.SIGALRM, lambda *args: delayed(r))
signal.setitimer(signal.ITIMER_REAL, 1)
return {'statusCode': 200}
Yields:
Cloudwatch logs
Think about the way that AWS implements lambdas:
When a lambda is being invoked, a container is being raised and the environment starts to initialize (this is the cold-start phase).
During this initialization, the python interpreter is starting, and behind the scene, an AWS code fetches events from the lambda service and triggers your handler.
This initialization is costly, so AWS prefers to wait with the same "process" for the next event. On the happy flow, it arrives "fast enough" after the previous finished, so they spared the initialization and everyone is happy.
Otherwise, after a small period, they will shutdown the container.
As long as the interpreter is still on - the signal that we fired in one invocation will leak to the next invocation.
Note also the concurrency of the lambdas - two invocations that run in parallel are running on different containers, thus have different interpreters and this alarm will not leak.

Related

What's the difference between the SQS batch window duration and ReceiveMessage wait time seconds?

You can specify SQS as an event source for Lambda functions, with the option of defining a batch window duration.
You can also specify the WaitTimeSeconds for a ReceiveMessage call.
What are the key differences between these two settings?
What are the use cases?
They're fundamentally different.
The receive message wait time setting is a setting that determines if your application will be doing long or short polling. You should (almost) always opt for long polling as it helps reduce your costs; the how is in the documentation.
It can be set in 2 different ways:
on the queue level, by setting the ReceiveMessageWaitTimeSeconds attribute
on the message level, by setting the WaitTimeSeconds property on your ReceiveMessage calls
It determines how long your application will wait for a message to become available in the queue before returning an empty result.
On the other hand, you can configure an SQS queue as an event source for Lambda functions by adding it as a trigger.
When creating an SQS trigger, you have 2 optional fields:
batch size (the number of messages in each batch to send to the function)
batch window (the maximum amount of time to gather SQS messages before invoking the function, in seconds)
The batch window function sets the MaximumBatchingWindowInSeconds attribute for SQS event source mapping.
It's the maximum amount of time, in seconds, that the Lambda poller waits to gather the messages from the queue before invoking the function. The batch window just ensures that more messages have accumulated in the SQS queue before the Lambda function is invoked. This increases the efficiency and reduces the frequency of Lambda invocations, helping you reduce costs.
It's important to note that it's defined as the maximum as it's not guaranteed.
As per the docs, your Lambda function may be invoked as soon as any of the below are true:
the batch size limit has been reached
the batching window has expired
the batch reaches the payload limit of 6 MB
To conclude, both features are used to control how long something waits but the resulting behaviour differs.
In the first case, you're controlling how long the poller (your application) could wait before it detects a message in your SQS queue & then immediately returns. You could set this value to 10 seconds but if a message is detected on the queue after 5 seconds, the call will return. You can change this value per message, or have a universal value set at the queue level. You can take advantage of long (or short) polling with or without Lambda functions, as it's available via the AWS API, console, CLI and any official SDK.
In the second case, you're controlling how long the poller (inbuilt Lambda poller) could wait before actually invoking your Lambda to process the messages. You could set this value to 10 second and even if a message is detected on the queue after 5 seconds, it may still not invoke your Lambda. Actual behaviour as to when your function is invoked, will differ based on batch size & payload limits. This value is naturally set at the Lambda level & not per message. This option is only available when using Lambda functions.
You can’t use both together as long/short polling is for a constantly running application or one-off calls. A Lambda function cannot poll SQS for more than 15 minutes and that is with a manual invocation.
For Lambda functions, you would use native SQS event sourcing and for any other service/application/use case, you would manually integrate SQS.
They're same in the sense that both aim to help you to ultimately reduce costs, but very different in terms of where you can use them.

Boto3 invocations of long-running Lambda runs break with TooManyRequestsException

Experience with "long-running" Lambda's
In my company, we recently ran into this behaviour, when triggering Lambdas, that run for > 60 seconds (boto3's default timeout for connection establishment and reads).
The beauty of the Lambda invocation with boto3 (using the 'InvocationType' 'RequestResponse') is, that the API returns the result state of the respective Lambda run, so we wanted to stick to that.
The issue seems to be, that the client fires to many requests per minute on the standing connection to the API. Therefore, we experimented with the boto3 client configuration, but increasing the read timeout resulted in new (unwanted) invocations after each timeout period and increasing the connection timeout triggered a new invocation, after the Lambda was finished.
Workaround
As various investigations and experimentation with boto3's Lambda client did not result in a working setup using 'RequestResponse' invocations,
we circumvented the problem now by making use of Cloudwatch logs. For this, the Lambda has to be setup up to write to an accessible log group. Then, these logs can the queried for the state. Then you would invoke the Lambda and monitor it like this:
import boto3
lambda_client = boto3.client('lambda')
logs_clients = boto3.client('logs')
invocation = lambda_client.invoke(
FunctionName='your_lambda',
InvocationType='Event'
)
# Identifier of the invoked Lambda run
request_id = invocation['ResponseMetadata']['RequestID']
while True:
# filter the logs for the Lambda end event
events = logs_client.filter_log_events(
logGroupName='your_lambda_loggroup',
filterPattern=f'"END RequestId: {request_id}"'
).get('events', [])
if len(events) > 0:
# the Lambda invocation finished
break
This approach works for us now, but it's honestly ugly. To make this approach slightly better, I recommend to set the time range filtering in the filter_log_events call.
One thing, that was not tested (yet): The above approach only tells, whether the Lambda terminated, but not the state (failed or successful) and the default logs don't hold anything useful in that regards. Therefore, I will investigate, if a Lambda run can know its own request id during runtime. Then the Lambda code can be prepared to also write error messages with the request id, which then can be filtered for again.

Does AWS-Lamda support multi-threading?

I am writing an AWS-lambda function that reads past 1-minute data from a DB, converts it to JSON, and then pushes it to a Kafka topic.
The lambda will run every 1 minute.
So if a scenario comes like this:
at t1, a lambda process is invoked let's say P1, it is reading data from t1 to t1-1.
at t2, if P1 is not finished then will a new lambda process will be invoked or we will wait for P1 to finish so that we can invoke another process?
I understand that lambda support up to 1000 parallel processes in a region but in this situation, the lambda function will already be running in a process.
Lambda does support multi-threading and multi-process in the same execution (see an example).
However, from the context that you gave, that doesn't seem to be the underlying question.
Concurrency allows you to configure how many instances of a single Lambda function can be in execution at a given time. Quoting the relevant part of the documentation:
Concurrency is the number of requests that your function is serving at any given time. When your function is invoked, Lambda allocates an instance of it to process the event. When the function code finishes running, it can handle another request. If the function is invoked again while a request is still being processed, another instance is allocated, which increases the function's concurrency. The total concurrency for all of the functions in your account is subject to a per-region quota.
Triggers define when a Lambda is executed. If you have multiple events coming (from SQS for example) to a lambda that has concurrency>1, then it's likely that there will be multiples instances of that given Lambda running at the same time.
With concurrency=1, if you trigger the Lambda every 1 minute and it takes more than 1 minute to execute and finish, then your processing will lag behind. In other words, future Lambdas will be processing t-2, t-3, and so on.
With concurrency=1, if you want something to be processed every 1 minute, you have to make sure it doesn't take more than 1 minute to process it. With extra concurrency it can take longer.

Could AWS Scheduled Events potentially overlap?

I would like to create a Scheduled Events to run a Lambda to execute an api call every 1 minute (cron-line behaviour).
The caveat to this setup is that; the external api is un-reliable / slow and the api call sometimes could last longer than 1 minute.
So, my question here is; given the setup & scenario - would AWS run another Scheduled Event and execute the lambda before the previous executing finished? I.e. overlap?
If it does; is there a way to configure the scheduled event to not "overlap"?
I did some initial research into this and came across this article:
https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
It looks like you can set concurrency limits at function level? Is this the way to achieve non-overlapping scheduled lambda executions? i.e. set the function's concurrency limit to 1?
Yes by default it will execute your Lambda function every 1 minute, regardless if the previous invocation has completed or not.
To enforce no more than one running instance of your Lambda function at a time, set the Concurrency setting of your Lambda function to 1.

Request time out from AWS Lambda

I'm trying to create app with serverless framework.
Every functions works fine on my local machine.
But remote machine(Lambda) gives to below error message. (This take a 30sec)
{
"message": "Endpoint request timed out"
}
Code size is 37.5 MB, and instance memory size is 3008 MB.
Any advice or suggestion would be appreciated. Thank you in advance
I solved my problems with set to callbackWaitsForEmptyEventLoop = false.
By default calling the callback() function in a NodeJS Lambda function does not end the function execution. It will continue running until the event loop is empty. A common issue with NodeJS Lambda functions continuing to run after callback is called occurs when you are holding on to open database connections. You haven't posted any code, so I can't give specific recommendations, but you would need to determine if you are leaving database connections open in your code or something similar.
Here's what the documentation says about the behavior of callbackWaitsForEmptyEventLoop:
callbackWaitsForEmptyEventLoop
The default value is true. This property is useful only to modify the
default behavior of the callback. By default, the callback will wait
until the Node.js runtime event loop is empty before freezing the
process and returning the results to the caller. You can set this
property to false to request AWS Lambda to freeze the process soon
after the callback is called, even if there are events in the event
loop. AWS Lambda will freeze the process, any state data and the
events in the Node.js event loop (any remaining events in the event
loop processed when the Lambda function is called next and if AWS
Lambda chooses to use the frozen process). For more information about
callback, see Using the Callback Parameter.