How to monitor+report failures of async invoked concurrent lambdas? - amazon-web-services

I have this lambda in which the first instance invokes itself 30-40 times to process data concurrently. Invocation happens using the async fire and forget Event invocation type. The very first instance obviously dies after invocation completes.
I want the first lambda to stay alive after invocation and report on number of instances triggered and if any lambdas failed through SNS notifications. So I switched to RequestResponse invocation type, but the problem here is now my lambda invokes one instance waits for the response from the instance (which can take minutes) then invokes the next one.
How can I invoke lambdas asynchronously but still get the reporting and tracking from the first instance?

You can report on the number of instances triggered using the first Lambda invocation.
For failures, it is not possible to use the same Lambda instance, as the error may happen much further down the line (as you mentioned, it may take minutes to complete). However, you can configure the parallelly running instances to report on the invocation status.
In fact, there is a built-in feature for this. Refer to this link. That is, you can send the invocation records (request and response information) to SQS. By consuming the queue, you should be able to get a report of success and failed instances.

Related

AWS Lambda Functions: Will Different Triggers Reuse an Exection Enviornment?

Here's what I know, or think I know.
In AWS Lambda, the first time you call a function is commonly called a "cold start" -- this is akin to starting up your program for the first time.
If you make a second function invocation relatively quickly after your first, this cold start won't happen again. This is colloquially known as a "warm start"
If a function is idle for long enough, the execution environment goes away, and the next request will need to cold start again.
It's also possible to have a single AWS Lambda function with multiple triggers. Here's an example of a single function that's handling both API Gateway requests and SQS messages.
My question: Will AWS Lambda reuse (warm start) an execution environment when different event triggers come in? Or will each event trigger have it's own cold start? Or is this behavior that's not guaranteed by Lambda?
Yes, different triggers will use the same containers since the execution environment is the same for different triggers, the only difference is the event that is passed to your Lambda.
You can verify this by executing your Lambda with two types of triggers (i.e. API Gateway and simply the Test function on the Lambda Console) and looking at the CloudWatch logs. Each Lambda container creates its own Log Stream inside of your Lambda's Log Group. You should see both event logs going to the same Log Stream which means the 2nd event is successfully using the warm container created by the first event.

CloudWatch event Lambda trigger and concurrency

If a Lambda function has a concurrency>1, and there are several instances running, does a CloudWatch event Lambda trigger get sent to all the running instances?
The question wording is a little bit ambiguous. I will try my best to make it more clear.
If a Lambda function has a concurrency>1, and there are several instances running
I think OP is talking about reserved concurrency which is set to a value that's greater than 1. In other words, the function is not throttled by default and can run multiple instances in parallel.
does a CloudWatch event Lambda trigger get sent to all the running instances?
This part is ambiguous. #hephalump provided one interpretation in the question comment.
I have another interpretation. If you are asking whether the currently-running lambda containers will be reused after the job is done, then here is the answer:
Based on #hephalump's comment, now it's clear that one CloudWatch event will only trigger one lambda instance to run. If multiple events come in during a short period of time, then multiple lambda instances will be triggered to run in parallel. Back to the question, if all existing lambda instances of that function are busy running, then no container will be reused, and another new lambda instance will be spun up to handle this event. If one of the running instances has just finished its job, then that container along with the execution environment will be reused to handle this incoming event from CloudWatch.
Hope this helps.

Multiple Triggered AWS Lambda Logs are displayed in a single Cloudwatch Log Streams

We have created a Lambda Function, which has to trigger after every minute. It is working as expected and showing the correct result. But logs stream, which are getting through Cloudwatch events, contains multiple Lambda trigger logs in a single Cloudwatch Log stream.
Event Rule: -
Is it possible to create 1 cloudwatch log for 1 Lambda Trigger ??
As per the AWS Lambda documentation here, a log stream represents an instance of your function. In other words, a log stream represents the logs from a single execution environment for your Lambda Function... The execution environment is also called the context (one of the arguments you can pass in to your handler) and the reason you're not getting a new log stream with each invocation is because of the context in which Lambda functions execute.
When you invoke your Lambda function, AWS loads the container that holds your function code and provisions the requested resources that are required to enable your function to execute: CPU, memory, networking etc. These all form the functions execution environment which is also called the context. These resources take time to provision and this results in increased latency for the execution of the function. This is commonly known as a "cold start".
In order to mitigate this undesired latency, or cold start, with each invocation, after your function has completed its initial execution, instead of terminating the execution environment, AWS keeps the container and execution environment running and the resources such as cpu, memory and networking, provisioned and ready for and in anticipation of, the next invocation. This is known as keeping the function "warm". When the container is warm, subsequent invocations of your function are executed in the same execution environment, or context, as the previous invocation, and because the invocation was executed by the same instance of the function the logs are written to the same log stream as the previous invocation(s), which is the log stream that represents that instance / execution environment / context of the function.
Notwithstanding this, it's worth pointing out that AWS does not keep the container running indefinitely. If there is no subsequent invocation within a given period of time (there's no exact period of time but it's generally considered to be between 30 and 45 minutes, source) AWS will terminate the container, and release the resources for use by another function. The next time the Lambda function is invoked, AWS will repeat the provisioning process for the function and a new execution environment will be created, and this will cause your functions logs to be written to a new log stream which represents the new execution environment / context / instance of your function.
You can read more about Lambda's execution context here.
Rachit,
Your Lambda function comes with a CloudWatch Logs log group, with a
log stream for each instance of your function. The runtime sends
details about each invocation to the log stream, and relays logs and
other output from your function's code.
Moreover, From the AWS Cloudwatch documentation you can see that a log stream is created each time the logs come from a different event source. In case of Lambda, it's one stream per Lambda container where each container might process multiple events.
A log stream is a sequence of log events that share the same source.
Each separate source of logs into CloudWatch Logs makes up a separate
log stream.
Ref:
https://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-logging.html

Why is Lambda throttled when invoking from SQS?

I have an SQS queue that is used as an event source for a Lambda function. Due to DB connection limitations, I have set a maximum concurrency to 5 for the Lambda function.
Under normal circumstances, everything works fine, but when we need to make changes, we deliberately disable the SQS trigger. Messages start to back up in the SQS queue as expected.
When the trigger is re-enabled, 5 Lambda functions are instantiated, and start to process the messages in the queue, however I also see CloudWatch telling me that the Lambda is being throttled.
Please could somebody explain why this is happening? I expect the available Lambda functions to simply work through the backlog as fast as they can, and wouldn't expect to see throttling due to the queue.
This is expected behaviour.
"On reaching the concurrency limit associated with a function, any further invocation requests to that function are throttled, i.e. the invocation doesn't execute your function. Each throttled invocation increases the Amazon CloudWatch Throttles metric for the function"
https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html

Aws lambda retry behavior when triggered by cloudwatch event

I have created a lambda function which is triggered through cloudwatch event cron.
While testing I found that lambda retry is not working in case of timeout.
I want to understand what is the expected behaviour.Should retry happen in case of timeout?
P.S I have gone through the document on the aws site but still can't figure out
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
Found the aws documentation on this,
"Error handling for a given event source depends on how Lambda is invoked. Amazon CloudWatch Events is configured to invoke a Lambda function asynchronously."
"Asynchronous invocation – Asynchronous events are queued before being used to invoke the Lambda function. If AWS Lambda is unable to fully process the event, it will automatically retry the invocation twice, with delays between retries."
So the retry should happen in this case. Not sure what was wrong with my lambda function , I just deleted and created again and retry worked this time.
Judging from the docs you linked to it seems that the lambda function is called again if it has timed out and the timeout is because it is waiting for another resource (i.e. is blocked by network):
The function times out while trying to reach an endpoint.
As a cron event is not stream based (if it is synchronous or asynchronous seems not be be clear from the docs) it will be retried.
CloudWatch Event invokes a Lambda function asynchronously.
For asynchronous invocation, Lambda manages the function's asynchronous event queue and attempts to retry two more times on errors including timeout.
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
So with the default configuration, your function should retry with timeout errors. If it doesn't, there might be some other reasons as follows:
The function doesn't have enough concurrency to run and events are throttled. Check function's reserved concurrency setting. It should be at least 1.
When above happens, events might also be deleted from the queue without being sent to the function. Check function's asynchronous invocation setting, make sure it has enough age to keep the events in the queue and retry attempts is not zero.