AWS Lambda Function invoked by SQS trigger is not respecting the visibility timeout I'm manually setting within the function

AWS Lambda Function invoked by SQS trigger is not respecting the visibility timeout I'm manually setting within the function - amazon-web-services

I'm implementing my own webhooks service which will send out events to subscribed webhooks.
Overview of architecture:
events are pushed onto an SQS queue
a lambda function is triggered by SQS messages (event source mapping)
for each event, I make outgoing http requests to subscribed webhooks
non-2xx responses must be retried with exponential backoff (in such event, I change the message visibility on the received message)
since lambdas that are invoked by SQS will automatically delete the message upon completion I throw an error at the end of the function to prevent the automatic delete
As far as I can tell, the call to change the message visibility is succeeding. I'm wondering if there's something else baked into lambdas that are invoked by SQS. Upon failure from the lambda, is it internally changing the message visibility again? Or do lambdas that are invoked by SQS not respect message visibility changes (this really doesn't make any sense to me). Curious if anyone has any insight into this problem. I was quite surprised to find out that lambda automatically deletes messages upon success since it makes my particular use-case a little clunky feeling - throwing an error to fail the lambda function to prevent the message from being deleted.
Thanks in advance!

The nature of the SQS integration with Lambda is that the integration controls the polling of the messages. The mechanism used to determine if the message should be deleted is that the response from the lambda is not an error. It's not clearly stated in the documentation, but I believe when an error occurs the integration is going to set the visibility timeout to zero in order to make it immediately available for another process to pick it up. So in your example you are setting it to some number that allows you to retry, but when you return an error the integration is setting the timeout back to zero. If you need to have more control over that process you probably should not use the integration.

Update: This is actually not the case. I was failing to properly wait for the call to adjust the timeout to finish. As such the lambda was closing before that request was completing. The timeout I'm setting on the message within the lambda is being respected. I'm then throwing an error to prevent the message from being deleted.

Since SQS triggers for lambdas process messages in batches, using await in a standard forEach doesn't work (forEach does not await the callbacks if they are async). To bypass that, you can create your own version of an async forEach:
var AWS = require('aws-sdk');
exports.handler = async function(event, context) {
var sqs = new AWS.SQS({apiVersion: '2012-11-05'});
await asyncForEach(event.Records, async record => {
const { receiptHandle } = record;
const sqsParams = {
QueueUrl: '<YOUR_QUEUE_URL>', /* required */
ReceiptHandle: receiptHandle, /* required */
VisibilityTimeout: '<in seconds>' /* required */
};
try {
let res = await sqs.changeMessageVisibility(sqsParams).promise();
console.log(res);
} catch (err) {
console.log(err, err.stack);
throw new Error('Fail.');
}
}
});
return {};
};
async function asyncForEach(array, callback) {
for (let index = 0; index < array.length; index++) {
await callback(array[index], index, array);
}
}

Related

Is AWS SQS FIFO queue really exact-once delivery

I have the below function handler code.
public async Task FunctionHandler(SQSEvent evnt, ILambdaContext context)
{
foreach (var message in #event.Records)
{
// Do work
// If a message processed successfully delete the SQS message
// If a message failed to process throw an exception
}
}
It is very confusing that while I don't handle validation logic for creating records in my database (already exists) I see database records with the same ID created twice meaning the same message processed more than once!
In my code, I am deleting the message after successful processing or throw an exception upon failure assuming all remained ordered messages will just go back to the queue visible for any consumer to reprocess but I can see code failing now because the same records are created twice for an event that succeeded.
Is AWS SQS FIFO exact-once delivery or am I missing some kind of retry processing policy?
This is how I delete the message upon successful processing.
var deleteMessageRequest = new DeleteMessageRequest
{
QueueUrl = _sqsQueueUrl,
ReceiptHandle = message.ReceiptHandle
};
var deleteMessageResponse =
await _amazonSqsClient.DeleteMessageAsync(deleteMessageRequest, cancellationToken);
if (deleteMessageResponse.HttpStatusCode != HttpStatusCode.OK)
{
throw new AggregateSqsProgramEntryPointException(
$"Amazon SQS DELETE ERROR: {deleteMessageResponse.HttpStatusCode}\r\nQueueURL: {_sqsQueueUrl}\r\nReceiptHandle: {message.ReceiptHandle}");
}
The documentation is very explicit about this
"FIFO queues provide exactly-once processing, which means that each
message is delivered once and remains available until a consumer
processes it and deletes it."
They also mention protecting your code from retries but that is confusing for an exactly-once delivery queue type but then I see the below in their documentation which is confusing.
Exactly-once processing.
Unlike standard queues, FIFO queues don't
introduce duplicate messages. FIFO queues help you avoid sending
duplicates to a queue. If you retry the SendMessage action within the
5-minute deduplication interval, Amazon SQS doesn't introduce any
duplicates into the queue.
Consumer retries (how's this possible)?
If the consumer detects a failed ReceiveMessage action, it can retry
as many times as necessary, using the same receive request attempt ID.
Assuming that the consumer receives at least one acknowledgement
before the visibility timeout expires, multiple retries don't affect
the ordering of messages.

This was entirely our application error and how we treat the Eventssourcing aggregate endpoints due to non-thread-safe support.

AWS - Lambda and SQS behavior

I have setup an SQS, DeadLetterQue (DLQ) and a lambda. I am having some questions regarding the body format that the SQS receives.
After some tries, I can see some messages are ending up in a DLQ.
I have read many documentations, but really I still can't figure some basic things.
My task is fairly simple. I will use Axios to send some payload to external API. That API might give me an error, or not available at that time. This code is very simple, but at the end it will include Axios implementation:
exports.handler = async (event, context) => {
event.Records.forEach(record => {
const { body } = record;
const { messageAttributes } = record;
console.log(body)
console.log(messageAttributes);
});
let somethingWentWrongWithApi = new Error();
throw somethingWentWrongWithApi;
};
After this is run, after some time I see message in DLQ, and original SQS is empty, as I expected.
In real code, I will have catch block. And inside it I will throw the error, exactly as in example. Lambda will try to execute it three times (I am not sure where I got this information). Then, it will return it to SQS, where in turn the SQS will push it to DLQ.
I wonder, should I implement in Lambda retries, and after three times throw the error... or just throw it and rely on existing process (three times retrial)?
I am still reading about different setting on both SQS and Lambda.

To extend execution tries you have to go to:
SQS service;
Right click on your SQS;
Choose a configure Queue option;
Under the Dead Letter Queue Settings line you can see a
Maximum Receives property (which you can set between 1 and 1000).
About errors. If something going wrong on lambda side then lambda throwing an error, you can check what kind of errors throwed with cloudwatch.
Go to aws CloudWatch;
On the right panel find Logs;
Chose Log Group.
If you have any additional questions just summon me. I'll try to answer them and I'll extend that answer.

Asynchronous HTTP request in AWS Lambda

I am wanting to execute a http request inside of a lambda function, invoked by API Gateway. The problem is, the request takes a bit of time to complete (<20 seconds) and don't want the client waiting for a response. In my research on asynchronous requests, I learned that I can pass the X-Amz-Invocation-Type:Event header to make the request execute asynchronously, however this isn't working and the code still "waits" for the http request to complete.
Below is my lambda code:
'use strict';
const https = require('https');
exports.handler = function (event, context, callback) {
let requestUrl;
requestUrl = event.queryStringParameters.url;
https.get(requestUrl, (res) => {
console.log('statusCode:', res.statusCode);
console.log('headers:', res.headers);
res.on('data', (d) => {
process.stdout.write(d);
});
}).on('error', (e) => {
console.error(e);
});
let response = {
"statusCode": 200,
"body": JSON.stringify(event.queryStringParameters)
};
callback(null, response);
};
Any help would be appreciated.

You can use two Lambda functions.
Lambda 1 is triggered by API Gateway then calls Lambda 2 asynchronously (InvocationType = Event) then returns a response to the user.
Lambda 2, once invoked, will trigger the HTTP request.

Whatever you do, don't use two lambda functions.
You can't control how lambda is being called, async or sync. The caller of the lambda decides that. For APIGW, it has decided to call lambda sync.
The possible solutions are one of:
SQS
Step Functions (SF)
SNS
In your API, you call out to one of these services, get back a success, and then immediately return a 202 to your caller.
If you have a high volume of single or double action execution use SQS. If you have potentially long running with complex state logic use SF. If you for someone reason want to ignore my suggestions, use SNS.
Each of these can (and should) call back out to a lambda. In the case that you need to run more than 15 minutes, they can call back out to CodeBuild. Ignore the name of the service, it's just a lambda that supports up to 8 hour runs.
Now, why not use two lambdas (L1, L2)? The answer is simple. Once you respond that your async call was queued (SQS, SF, SNS), to your users (202). They'll expect that it works 100%. But what happens if that L2 lambda fails. It won't retry, it won't continue, and you may never know about it.
That L2 lambda's handler no longer exist, so you don't know the state any more. Further, you could try to add logging to L2 with wrapper try/catch but so many other types of failures could happen. Even if you have that, is CloudWatch down, will you get the log? Possible not, it just isn't a reliable strategy. Sure if you are doing something you don't care about, you can build this arch, but this isn't how real production solutions are built. You want a reliable process. You want to trust that the baton was successfully passed to another service which take care of completing the user's transaction. This is why you want to use one of the three services: SQS, SF, SNS.

Invoke AWS Lambda and return response to API Gateway asyncronously

My use case is such that I'll have an AWS Lambda front ended with API Gateway.
My requirement is that once the Lambda is invoked it should return a 200 OK response back to API Gateway which get forwards this to the caller.
And then the Lambda should start its actual processing of the payload.
The reason for this is that the API Gateway caller service expects a response within 10 seconds else it times out. So I want to give the response before I start with the processing.
Is this possible?

With API Gateway's "Lambda Function" integration type, you can't do this with a single Lambda function -- that interface is specifically designed to be synchronous. The workaround, if you want to use the Lambda Function integration type is for the synchronous Lambda function, invoked by the gateway, to invoke a second, asynchronous, Lambda function through the Lambda API.
However, asynchronous invocations are possible without the workaround, using an AWS Service Proxy integration instead of a Lambda Function integration.
If your API makes only synchronous calls to Lambda functions in the back end, you should use the Lambda Function integration type. [...]
If your API makes asynchronous calls to Lambda functions, you must use the AWS Service Proxy integration type described in this section. The instructions apply to requests for synchronous Lambda function invocations as well. For the asynchronous invocation, you must explicitly add the X-Amz-Invocation-Type:Event header to the integration request.
http://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-lambda.html

Yes, simply create two Lambda functions. The first Lambda function will be called by the API Gateway and will simply invoke the second Lambda function and then immediately return successfully so that the API Gateway can respond with an HTTP 200 to the client. The second Lambda function will then take as long as long as it needs to complete.

If anyone is interested, here is the code you can use to do the two lambdas approach. The code below is the first lambda that you should setup which would then call the second, longer running, lambda. It takes well under a second to execute.
const Lambda = new (require('aws-sdk')).Lambda();
/**
* Note: Step Functions, which are called out in many answers online, do NOT actually work in this case. The reason
* being that if you use Sequential or even Parallel steps they both require everything to complete before a response
* is sent. That means that this one will execute quickly but Step Functions will still wait on the other one to
* complete, thus defeating the purpose.
*
* #param {Object} event The Event from Lambda
*/
exports.handler = async (event) => {
let params = {
FunctionName: "<YOUR FUNCTION NAME OR ARN>",
InvocationType: "Event", // <--- This is KEY as it tells Lambda to start execution but immediately return / not wait.
Payload: JSON.stringify( event )
};
// we have to wait for it to at least be submitted. Otherwise Lambda runs too fast and will return before
// the Lambda can be submitted to the backend queue for execution
await new Promise((resolve, reject) => {
Lambda.invoke(params, function(err, data) {
if (err) {
reject(err, err.stack);
}
else {
resolve('Lambda invoked: '+data) ;
}
});
});
// Always return 200 not matter what
return {
statusCode : 200,
body: "Event Handled"
};
};

Check the answer here on how to set up an Async Invoke to the Lambda function. This will return 200 immediately to the client, but the Lambda will process on it's own asynchronously.
https://stackoverflow.com/a/40982649/5679071

Lambda processing same SNS event multiple times?

I have an AWS Lambda function configured to process SNS events from a single topic. When the function runs it will potentially send out some other notifications and call context.succeed, or context.fail if some error occurs.
The problem is the same SNS event seems to be invoking the Lambda multiple times. Looking at the CloudWatch logs I see
START RequestId: cd7afdf8-2816-11e6-bca2-6f2e3027c5e1 Version: $LATEST
Which eventually ends
END RequestId: cd7afdf8-2816-11e6-bca2-6f2e3027c5e1
REPORT RequestId: cd7afdf8-2816-11e6-bca2-6f2e3027c5e1 ...
Immediately followed in the same log by a start for the exact same RequestID
START RequestId: cd7afdf8-2816-11e6-bca2-6f2e3027c5e1 Version: $LATEST
Looking into CloudWatch at the topic sending this SNS Event it seems to only be Publishing and Delivering only one even as I had expected, so it seems to be a Lambda-side problem. Does anyone know of any reason an event might be triggering my lambda multiple times like this?
Edit: I've noticed that this seems to be happening when the lambda receives a failure. I don't see any sort of retry configuration on the lambda and wouldn't expect it to be behaving this way by default.

From Amazon lambda FAQs page
https://aws.amazon.com/lambda/faqs/
Q: What happens if my Lambda function fails during processing an event?
On failure, Lambda functions being invoked synchronously will respond with an exception. Lambda functions being invoked asynchronously are retried at least 3 times, after which the event may be rejected. Events from Amazon Kinesis streams and Amazon DynamoDB streams are retried until the Lambda function succeeds or the data expires. Kinesis and DynamoDB Streams retain data for 24 hours.

I had a similar issue with a lambda cron job function. I needed to send a request to a service/api that runs for a couple of seconds before it can return the first byte/response (no response processing needed), so my fix was like the following:
exports.handler = (event, context, callback) => {
// TODO implement
var https = require("https");
token = "xxxxxxxx";
var options = {
host: "api.xxxxx.com",
path: "/manage/cronjob/video",
method: "GET",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer "+token
}
};
var req = https.request(options);
req.write("");
req.end();
//add timeout for context.done()
setTimeout(function(){
context.done();
callback(null, 'Hello from Lambda');
},1000); //This timeout should be lower than the lambda's timeout };
In this is example, we have an explicit callback to force the lambda exit, before the function gets a timeout error & triggers a retry for another 2 times
Hope it helps

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js