I have an AWS Lambda Function which is called using API gateway. This has a default timeout of 30 seconds.
The processing I need to do takes longer than 30 seconds so I have a second Lambda which is called by the first like this.
using (var client = new AmazonLambdaClient(RegionEndpoint.EUWest2))
{
var lambdaRequest = new InvokeRequest
{
FunctionName = "****LambdaFunction",
InvocationType = "Event",
Payload = JsonConvert.SerializeObject(callProcessingRequest)
};
client.InvokeAsync(lambdaRequest);
}
I've got a couple of issues with it.
First is the documentation says that InvocationType="Event" should call the Lambda async, but that doesn't seem to be the case, it takes about 10 seconds to run that invoke call.
The second more urgent issue is i'm getting intermittant errors logged like this.
1. Lambda encountered an UnobservedTaskException via 'TaskScheduler.UnobservedTaskException' event:
2. A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (Signature expired: 20230119T093505Z is now earlier than 20230119T093648Z (20230119T094148Z - 5 min.))
I created a Step Function with an Activity in it, when this SF starts it remains Running as it waits for manual approval. Correct.
At this point, I can invoke a lambda function (a worker) that calls the method getActivityTask. The answer to this method is something like this:
{
"taskToken": "xyz",
"input": "<THE INPUT PROVIDED>"
}
The taskToken value is useful because I need it to call a sendTaskSuccess or sendTaskFailure to restart or fail the SF execution.
Everything seems clear.
The thing I don't understand is why getActivityTask provide a result only the first time is invoked. I mean: if I have a SF execution blocked for manual approval and if I exec two times the call to getActivityTask, the first time it answers with the payload above, and the second time it is null.
I cannot find any info about this behaviour on the doc.
I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?
I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.
I am wondering something, and I really can't find information about it. Maybe it is not the way to go but, I would just like to know.
It is about Lambda working in batches. I know I can set up Lambda to consume batch messages. In my Lambda function I iterate each message, and if one fails, Lambda exits. And the cycle starts again.
I am wondering about slightly different approach
Let's assume I have three messages: A, B and C. I also take them in batches. Now if the message B fails (e.g. API call failed), I return message B to SQS and keep processing the message C.
Is it possible? If it is, is it a good approach? Because I see that I need to implement some extra complexity in Lambda and what not.
Thanks
There's an excellent article here. The relevant parts for you are...
Using a batchSize of 1, so that messages succeed or fail on their own.
Making sure your processing is idempotent, so reprocessing a message isn't harmful, outside of the extra processing cost.
Handle errors within your function code, perhaps by catching them and sending the message to a dead letter queue for further processing.
Calling the DeleteMessage API manually within your function after successfully processing a message.
The last bullet point is how I've managed to deal with the same problem. Instead of returning errors immediately, store them or note that an error has occurred, but then continue to handle the rest of the messages in the batch. At the end of processing, return or raise an error so that the SQS -> lambda trigger knows not to delete the failed messages. All successful messages will have already been deleted by your lambda handler.
sqs = boto3.client('sqs')
def handler(event, context):
failed = False
for msg in event['Records']:
try:
# Do something with the message.
handle_message(msg)
except Exception:
# Ok it failed, but allow the loop to finish.
logger.exception('Failed to handle message')
failed = True
else:
# The message was handled successfully. We can delete it now.
sqs.delete_message(
QueueUrl=<queue_url>,
ReceiptHandle=msg['receiptHandle'],
)
# It doesn't matter what the error is. You just want to raise here
# to ensure the trigger doesn't delete any of the failed messages.
if failed:
raise RuntimeError('Failed to process one or more messages')
def handle_msg(msg):
...
For Node.js, check out https://www.npmjs.com/package/#middy/sqs-partial-batch-failure.
const middy = require('#middy/core')
const sqsBatch = require('#middy/sqs-partial-batch-failure')
const originalHandler = (event, context, cb) => {
const recordPromises = event.Records.map(async (record, index) => { /* Custom message processing logic */ })
return Promise.allSettled(recordPromises)
}
const handler = middy(originalHandler)
.use(sqsBatch())
Check out https://medium.com/#brettandrews/handling-sqs-partial-batch-failures-in-aws-lambda-d9d6940a17aa for more details.
As of Nov 2019, AWS has introduced the concept of Bisect On Function Error, along with Maximum retries. If your function is idempotent this can be used.
In this approach you should throw an error from the function even if one item in the batch is failing. AWS with split the batch into two and retry. Now one half of the batch should pass successfully. For the other half the process is continued till the bad record is isolated.
Like all architecture decisions, it depends on your goal and what you are willing to trade for more complexity. Using SQS will allow you to process messages out of order so that retries don't block other messages. Whether or not that is worth the complexity depends on why you are worried about messages getting blocked.
I suggest reading about Lambda retry behavior and Dead Letter Queues.
If you want to retry only the failed messages out of a batch of messages it is totally doable, but does add slight complexity.
A possible approach to achieve this is iterating through a list of your events (ex [eventA, eventB, eventC]), and for each execution, append to a list of failed events if the event failed. Then, have an end case that checks to see if the list of failed events has anything in it, and if it does, manually send the messages back to SQS (using SQS sendMessageBatch).
However, you should note that this puts the events to the end of the queue, since you are manually inserting them back.
Anything can be a "good approach" if it solves a problem you are having without much complexity, and in this case, the issue of having to re-execute successful events is definitely a problem that you can solve in this manner.
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
To use this feature, your function must gracefully handle errors. Have your function logic catch all exceptions and report the messages that result in failure in batchItemFailures in your function response. If your function throws an exception, the entire batch is considered a complete failure.
To add to the answer by David:
SQS/Lambda supports reporting batch failures. How it works is within each batch iteration, you catch all exceptions, and if that iteration fails add that messageId to an SQSBatchResponse. At the end when all SQS messages have been processed, you return the batch response.
Here is the relevant docs section: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
I implemented this, but a batch of A, B and C, with B failing, would still mark all three as complete. It turns out you need to explicitly define the lambda event source mapping to expect a batch failure to be returned. It can be done by adding the key of FunctionResponseTypes with the value of a list containing ReportBatchItemFailures. Here is the relevant docs: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting
My sam template looks like this after adding this:
Type: SQS
Properties:
Queue: my-queue-arn
BatchSize: 10
Enabled: true
FunctionResponseTypes:
- ReportBatchItemFailures
I have a lambda function that sends http call to a API(Let's say 'A'). After getting response from 'A' Immediately return the stuff's to the caller i.e., (callback(null, success)) within 10secs. Then save the Data fetched from API 'A' to My External API(Let's Say 'B').
I tried like below but Lambda waits until event loop is empty(It is waiting for the response from second http call).
I doesn't want to set the eventLoopWaitEmpty to false since it freezes the eventloop and Execute next time when invoked.
request.get({url: endpointUrlA},
function (errorA, responseA, bodyA) {
callback(null, "success");
request.post({url: endpointUrlB,
body: bodyA,
json: true}, function(errorB, responseB, bodyB){
//Doesn't want to wait for this response
});
/* Also tried the callback(null, "success"); here too
});
Anybody have any thoughts on How can I implement this? Thanks!
PS - Btw I read the Previous similar questions doesn't seems to clear with those.
This seems like a good candidate for breaking up this lambda into two lambdas with some support code.
First lambda recieves request to 'A' and places a message onto SQS. It then returns to the caller the success status.
A separate process monitors the SQS queue and invokes a second Lambda on it when a message becomes available.
This has several benefits.
Firstly, you no longer have a long-running lambda waiting for a second system that may be down to return.
Secondly, you're doing things asynchronously in the background.
Take a look at this blog post for an overview of how this could work in practice.