I have an AWS lambda function that will sometimes fail because some other part of the system is not ready yet. In such cases, I want to retry the lambda in a couple seconds (preferably with an exponential backoff). How do I implement that?
It seems like feeding the lambda from an SQS or SNS queue might do the trick, but I can't figure out how to make it fail back to the queue and get retried.
From the Docs:
Any Lambda function invoked asynchronously is retried twice
(You can increase that retry number) You can set a Dead Letter Queue to keep all failed events for inspection, notification, etc. You can implement further logic to resend those events again or drop them, but IMHO you should have a dedicated lambda for that.
Related
I'm having a use case where I have an Amazon SQS fifo queue with lambda function. I need to make sure that fifo triggers the lambda only when the previous lambda execution is completed (also the events come in order). As from aws docs, fifo supports exactly once processing but it does not mention anywhere that it would not push more event on lambda untill the first message is completely processed.
I need to make sure that the next message is processed only when the previous message is completely processed by the lambda function.
Is there are way to ensure that message 2 is only processed by lambda when message 1 is completely processed by lambda?
fifo supports exactly once processing but it does not mention anywhere
that it would not push more event on lambda untill the first message
is completely processed.
SQS never pushes anything anywhere. You have to poll SQS for messages. When you configure Lambda integration with SQS Lambda is actually running a process behind the scenes to poll SQS for you.
AWS FIFO queues allow you to force messages to be processed in order by specifying a Message Group ID. When you specify the same Message Group ID for multiple messages, then the FIFO queue will only make one of those messages available at a time (in first-in-first-out) order. Only after the first message is removed from the queue is the second message made available, etc...
In addition to this, you should configure AWS Lambda SQS integration with a Batch Size of 1, so that it doesn't try to wait for multiple messages to be available before processing. And you could configure the Reserved Concurrency on the Lambda function to 1, as mentioned in the other answer, so that only one instance of the Lambda function can be running at a time.
It is actually pretty easy to do this. It is not clarified, since it will by default simply use up the available account concurrency and handle as many messages in parallel as is possible.
You can influence this by setting the reserved concurrency for the lambda function to 1. This will ensure no more than 1 lambda function will be executed at the same time.
I have lambda using SQS events as inputs. The SQS queue also has a DLQ.
The lambda function invokes a downstream Restful API (call this operation DoPostToAPI())
I need to guarantee that the lambda function attempts to call DoPostToAPI() at least 2 times (before message goes to DLQ)
What configuration of Lambda Retries and SQS Redrive policy would I need to set in order to accomplish the above requirement?
I need to be 100% certain that messages that arrive on the DLQ only arrive because they have attempted to been sent to downstream API DoPostToAPI() 2 times, and that messages dont arrive in DLQ for any other reason, if possible.
To me, it makes sense that messages should only arrive on the DLQ if the operation was attempted, and not for other reasons (i.e. I dont want messages to arrive on DLQ purely because of throttling, since the DoPostToAPI() should be attempted first before sending to DLQ) Why would I want messages on DLQ if the lambda function operation wasnt even attempted? In order words, I need the lambda operation to be guaranteed to be invoked before item moves to DLQ.
Can I get some help on this? Is it possible to guarantee that messages on the DLQ have arrived because of failed DoPostToAPI() api calls? Or is it (more unfortunate) possible that messages arrive on DLQ for reasons other than failed calls to downstream API?
From what I have read online so far, its possible that lambda , after doing receive on SQS message and moving the message to invisibile on the queue, could run into throttling issues and re-attempt the lambda invocation. But if it runs into lambda throttling again, it could end up back on main queue, which if it reaches its max receive count, could place the message on the DLQ without the lambda having been attempted at all. Is this correct?
For simplicity lets imagine the following inputs
SQSQueue1
SQSQueue1DLQ
LambdaFunction1 --> ServiceClient1.DoPostToAPI()
What is the interplay between the lambda "maximum_retry_attempts" and the SQS redrive_policy "maxReceiveCount"
In order to ensure your lambda attempts retries when using SQS, You only need set the SQS property
maxReceiveCount
This value controls how many lambda invocations will be attempted for a given batch before a message goes to the Dead Letter queue.
Unfortunately, the lambda property
maximum_retry_attempts
Does not apply for lambda functions using SQS as function event trigger.
We noticed that when setting up an AWS lambda to trigger from SQS that a lot of times the trigger happens minutes and sometimes up to an hour delay to trigger. I know AWS lambda does polling internally and when the queue is empty it probably does some exponential backoff.
However, we have a scheduler that runs every 30 min and pushes data into the queue. However, lambda is triggered much much later for a % of messages. Our business requirement that it triggers within a min.
Is there a way to force lambda to check the queue consistently? An alternative was to uses step functions but this is not possible due to another answer in this thread --> How do you run functions in parallel?
I was also thinking about pushing data into s3 and have lambda trigger from s3 asynchronously vs being polled but s3 does not have a batch api when we want to push a lot of records so that's out.
It turned out to be the wrong use of async/await when using AWS SDK. They only support .promise(). That was the reason that not all messages ended up in sqs.
Hope it helps others. AWS is working on a new sdk that will support async/await. Here is the link for their
https://github.com/aws/aws-sdk-js-v3/issues/153#issuecomment-457769969
I would check to make sure the SQS is either long pulling or short pulling.
"In almost all cases, Amazon SQS long polling is preferable to short polling. Long-polling requests let your queue consumers receive messages as soon as they arrive in your queue while reducing the number of empty ReceiveMessageResponse instances returned.
Amazon SQS long polling results in higher performance at reduced cost in the majority of use cases. However, if your application expects an immediate response from a ReceiveMessage call, you might not be able to take advantage of long polling without some modifications to your application.
For example, if your application uses a single thread to poll multiple queues, switching from short polling to long polling will probably not work, because the single thread will wait for the long-poll timeout on any empty queues, delaying the processing of any queues that might contain messages.
In such an application, it is a good practice to use a single thread to process only one queue, allowing the application to take advantage of the benefits that Amazon SQS long polling provides."
Also, if you're looking to fire off your lambda function, could you set up an SNS notification system to your SQS? Something along the lines of SQS SNS Lambda. This should get you sub minute and you're not constantly pulling the queue for messages. You'll just do it on a SNS.
https://aws.amazon.com/sqs/faqs/
I have an SQS queue that is used as an event source for a Lambda function. Due to DB connection limitations, I have set a maximum concurrency to 5 for the Lambda function.
Under normal circumstances, everything works fine, but when we need to make changes, we deliberately disable the SQS trigger. Messages start to back up in the SQS queue as expected.
When the trigger is re-enabled, 5 Lambda functions are instantiated, and start to process the messages in the queue, however I also see CloudWatch telling me that the Lambda is being throttled.
Please could somebody explain why this is happening? I expect the available Lambda functions to simply work through the backlog as fast as they can, and wouldn't expect to see throttling due to the queue.
This is expected behaviour.
"On reaching the concurrency limit associated with a function, any further invocation requests to that function are throttled, i.e. the invocation doesn't execute your function. Each throttled invocation increases the Amazon CloudWatch Throttles metric for the function"
https://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
I have created a lambda function which is triggered through cloudwatch event cron.
While testing I found that lambda retry is not working in case of timeout.
I want to understand what is the expected behaviour.Should retry happen in case of timeout?
P.S I have gone through the document on the aws site but still can't figure out
https://docs.aws.amazon.com/lambda/latest/dg/retries-on-errors.html
Found the aws documentation on this,
"Error handling for a given event source depends on how Lambda is invoked. Amazon CloudWatch Events is configured to invoke a Lambda function asynchronously."
"Asynchronous invocation – Asynchronous events are queued before being used to invoke the Lambda function. If AWS Lambda is unable to fully process the event, it will automatically retry the invocation twice, with delays between retries."
So the retry should happen in this case. Not sure what was wrong with my lambda function , I just deleted and created again and retry worked this time.
Judging from the docs you linked to it seems that the lambda function is called again if it has timed out and the timeout is because it is waiting for another resource (i.e. is blocked by network):
The function times out while trying to reach an endpoint.
As a cron event is not stream based (if it is synchronous or asynchronous seems not be be clear from the docs) it will be retried.
CloudWatch Event invokes a Lambda function asynchronously.
For asynchronous invocation, Lambda manages the function's asynchronous event queue and attempts to retry two more times on errors including timeout.
https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html
So with the default configuration, your function should retry with timeout errors. If it doesn't, there might be some other reasons as follows:
The function doesn't have enough concurrency to run and events are throttled. Check function's reserved concurrency setting. It should be at least 1.
When above happens, events might also be deleted from the queue without being sent to the function. Check function's asynchronous invocation setting, make sure it has enough age to keep the events in the queue and retry attempts is not zero.