Invoke an AWS Lambda function only after an Amazon DynamoDB export to Amazon S3 is totally complete - amazon-web-services

I am new to AWS and cloud technology in general. So, please bear with me if the use case below is a trivial one.
Well, I have a table in Amazon DynamoDB which I am exporting to Amazon S3 using exportTableToPointInTime API (ExportsToS3) on a scheduled basis everyday at 6 AM. It is being done using an AWS Lambda function in this way -
const AWS = require("aws-sdk");
exports.handler = async (event) => {
const dynamodb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });
const tableParams = {
S3Bucket: '<s3-bucket-name>',
TableArn: '<DynamoDB-Table-ARN>',
ExportFormat: 'DYNAMODB_JSON'
};
await dynamodb.exportTableToPointInTime(tableParams).promise();
};
The CFT template of the AWS Lambda function takes care of creating lambda roles and policies, etc. along with scheduling using Cloudwatch events. This setup works and the table is exported to the target Amazon S3 bucket everyday at the scheduled time.
Now, the next thing I want is that after the export to Amazon S3 is complete, I should be able to invoke an another lambda function and pass the export status to that lambda function which does some processing with it.
The problem I am facing is that the above lambda function finishes execution almost immediately with the exportTableToPointInTime call returning status as IN_PROGRESS.
I tried capturing the response of the above call like -
const exportResponse = await dynamodb.exportTableToPointInTime(tableParams).promise();
console.log(exportResponse);
Output of this is -
{
"ExportDescription": {
"ExportArn": "****",
"ExportStatus": "IN_PROGRESS",
"StartTime": "2021-09-20T16:51:52.147000+05:30",
"TableArn": "****",
"TableId": "****",
"ExportTime": "2021-09-20T16:51:52.147000+05:30",
"ClientToken": "****",
"S3Bucket": "****",
"S3SseAlgorithm": "AES256",
"ExportFormat": "DYNAMODB_JSON"
}
}
I am just obfuscating some values in the log with ****
As can be seen, the exportTableToPointInTime API call does not wait for the table to be exported completely. If it would have, it would have returned ExportStatus as either COMPLETED or FAILED.
Is there a way I can design the above use case to achieve my requirement - invoking an another lambda function only when the export is actually complete?
As of now, I have tried a brute force way to do it and which works but it definitely seems to be inefficient as it puts in a sleep there and also the lambda function is running for the entire duration of the export leading to cost impacts.
exports.handler = async (event) => {
const dynamodb = new AWS.DynamoDB({ apiVersion: '2012-08-10' });
const tableParams = {
S3Bucket: '<s3-bucket-name>',
TableArn: '<DynamoDB-Table-ARN>',
ExportFormat: 'DYNAMODB_JSON'
};
const exportResponse = await dynamodb.exportTableToPointInTime(tableParams).promise();
const exportArn = exportResponse.ExportDescription.ExportArn;
let exportStatus = exportResponse.ExportDescription.ExportStatus;
const sleep = (waitTimeInMs) => new Promise(resolve => setTimeout(resolve, waitTimeInMs));
do {
await sleep(60000); //waiting every 1 min and then calling listExports API
const listExports = await dynamodb.listExports().promise();
const filteredExports = listExports.ExportSummaries.filter(e => e.ExportArn == exportArn);
const currentExport = filteredExports[0];
exportStatus = currentExport.ExportStatus;
}
while (exportStatus == 'IN_PROGRESS');
var lambda = new AWS.Lambda();
var paramsForInvocation = {
FunctionName: 'another-lambda-function',
InvocationType: 'Event',
Payload: JSON.stringify({ 'ExportStatus': exportStatus })
};
await lambda.invoke(paramsForInvocation).promise();
};
What can be done to better it or the above solution is okay?
Thanks!!

One option to achieve this is to define a waiter in order to wait till a "Completed" status is returned from exportTableToPointInTime.
As far I can see there are a few default Waiters for DynamoDB already present, but there is not one for the export, so you'll need to write your own (you can use those already present as an example).
A good post describing how to use and write a waiter could be found here.
This way if the export takes less than 15 minutes you'll be able to catch it within the Lambda limits without the need of a secondary lambda.
If it takes longer than that, you'll need to decouple it, where you have multiple options as suggested by #Schepo and #wahmd:
using an S3 event on the other end
Using AWS EventBridge
Using SNS
combinations of the above.

Context: we want to export the DynamoDB table content into an S3 bucket and trigger a lambda when the export is complete.
In CloudTrail there's an ExportTableToPointInTime event that is sent when the export is started, but no event for when the export is finished.
A way to trigger a lambda once the export is completed is by creating an S3 trigger using this configuration:
In particular:
The creation event type is a complete multi-upload (others do not seem to work, not sure why).
I think the prefix can be omitted, but it's useful. It's composed of:
The first part is the table name, content.
The second part, AWSDynamoDB, is set automatically by the export tool.
This is the most important part. The last files created once the export is complete are manifest-summary.json and manifest-summary.md5. We must set the suffix as one of these files.

For an await call, you are missing "async" keyword on handler.
Change
exports.handler = (event) => {
to
exports.handler = async event => {
Since this is an await call, you need 'async' keyword with it.
Let me know if it fixed your issue.
Also, I suspect you don't need .promise() as it might be already returning promise. Anyways, please try with & without it incase it still doesn't work.
After dynamoDB await call, You can invoke another lambda. It would make sure that your lambda is invoked after dynamoDb export call is completed.
To invoke second lambda,
you can use aws sdk invoke package.
putEvent api using eventBridge.
Later option is better as it decouples both lambdas & also, first lambda does not have to wait until the seconds invocation is completed. (reduces lambda time, hence reduces cost)

Related

AWS Lambda: Turn off logging for certain requests

I have a lambda function an and a event to keep it warm which runs every 5 mins.
exports.handler = async (event) => {
if (await warmer(event)) {
console.log("Warming");
return 'warmed';
}
}
I am wondering if it is possible to turn off logging for the function if the if statement is true, ie it is just a warming request
Thanks
No, there is no such possibility. You can only turn off it in general by removing permissions from the lambda role to write to CW.

Asynchronous HTTP request in AWS Lambda

I am wanting to execute a http request inside of a lambda function, invoked by API Gateway. The problem is, the request takes a bit of time to complete (<20 seconds) and don't want the client waiting for a response. In my research on asynchronous requests, I learned that I can pass the X-Amz-Invocation-Type:Event header to make the request execute asynchronously, however this isn't working and the code still "waits" for the http request to complete.
Below is my lambda code:
'use strict';
const https = require('https');
exports.handler = function (event, context, callback) {
let requestUrl;
requestUrl = event.queryStringParameters.url;
https.get(requestUrl, (res) => {
console.log('statusCode:', res.statusCode);
console.log('headers:', res.headers);
res.on('data', (d) => {
process.stdout.write(d);
});
}).on('error', (e) => {
console.error(e);
});
let response = {
"statusCode": 200,
"body": JSON.stringify(event.queryStringParameters)
};
callback(null, response);
};
Any help would be appreciated.
You can use two Lambda functions.
Lambda 1 is triggered by API Gateway then calls Lambda 2 asynchronously (InvocationType = Event) then returns a response to the user.
Lambda 2, once invoked, will trigger the HTTP request.
Whatever you do, don't use two lambda functions.
You can't control how lambda is being called, async or sync. The caller of the lambda decides that. For APIGW, it has decided to call lambda sync.
The possible solutions are one of:
SQS
Step Functions (SF)
SNS
In your API, you call out to one of these services, get back a success, and then immediately return a 202 to your caller.
If you have a high volume of single or double action execution use SQS. If you have potentially long running with complex state logic use SF. If you for someone reason want to ignore my suggestions, use SNS.
Each of these can (and should) call back out to a lambda. In the case that you need to run more than 15 minutes, they can call back out to CodeBuild. Ignore the name of the service, it's just a lambda that supports up to 8 hour runs.
Now, why not use two lambdas (L1, L2)? The answer is simple. Once you respond that your async call was queued (SQS, SF, SNS), to your users (202). They'll expect that it works 100%. But what happens if that L2 lambda fails. It won't retry, it won't continue, and you may never know about it.
That L2 lambda's handler no longer exist, so you don't know the state any more. Further, you could try to add logging to L2 with wrapper try/catch but so many other types of failures could happen. Even if you have that, is CloudWatch down, will you get the log? Possible not, it just isn't a reliable strategy. Sure if you are doing something you don't care about, you can build this arch, but this isn't how real production solutions are built. You want a reliable process. You want to trust that the baton was successfully passed to another service which take care of completing the user's transaction. This is why you want to use one of the three services: SQS, SF, SNS.

Invoke AWS Lambda and return response to API Gateway asyncronously

My use case is such that I'll have an AWS Lambda front ended with API Gateway.
My requirement is that once the Lambda is invoked it should return a 200 OK response back to API Gateway which get forwards this to the caller.
And then the Lambda should start its actual processing of the payload.
The reason for this is that the API Gateway caller service expects a response within 10 seconds else it times out. So I want to give the response before I start with the processing.
Is this possible?
With API Gateway's "Lambda Function" integration type, you can't do this with a single Lambda function -- that interface is specifically designed to be synchronous. The workaround, if you want to use the Lambda Function integration type is for the synchronous Lambda function, invoked by the gateway, to invoke a second, asynchronous, Lambda function through the Lambda API.
However, asynchronous invocations are possible without the workaround, using an AWS Service Proxy integration instead of a Lambda Function integration.
If your API makes only synchronous calls to Lambda functions in the back end, you should use the Lambda Function integration type. [...]
If your API makes asynchronous calls to Lambda functions, you must use the AWS Service Proxy integration type described in this section. The instructions apply to requests for synchronous Lambda function invocations as well. For the asynchronous invocation, you must explicitly add the X-Amz-Invocation-Type:Event header to the integration request.
http://docs.aws.amazon.com/apigateway/latest/developerguide/integrating-api-with-aws-services-lambda.html
Yes, simply create two Lambda functions. The first Lambda function will be called by the API Gateway and will simply invoke the second Lambda function and then immediately return successfully so that the API Gateway can respond with an HTTP 200 to the client. The second Lambda function will then take as long as long as it needs to complete.
If anyone is interested, here is the code you can use to do the two lambdas approach. The code below is the first lambda that you should setup which would then call the second, longer running, lambda. It takes well under a second to execute.
const Lambda = new (require('aws-sdk')).Lambda();
/**
* Note: Step Functions, which are called out in many answers online, do NOT actually work in this case. The reason
* being that if you use Sequential or even Parallel steps they both require everything to complete before a response
* is sent. That means that this one will execute quickly but Step Functions will still wait on the other one to
* complete, thus defeating the purpose.
*
* #param {Object} event The Event from Lambda
*/
exports.handler = async (event) => {
let params = {
FunctionName: "<YOUR FUNCTION NAME OR ARN>",
InvocationType: "Event", // <--- This is KEY as it tells Lambda to start execution but immediately return / not wait.
Payload: JSON.stringify( event )
};
// we have to wait for it to at least be submitted. Otherwise Lambda runs too fast and will return before
// the Lambda can be submitted to the backend queue for execution
await new Promise((resolve, reject) => {
Lambda.invoke(params, function(err, data) {
if (err) {
reject(err, err.stack);
}
else {
resolve('Lambda invoked: '+data) ;
}
});
});
// Always return 200 not matter what
return {
statusCode : 200,
body: "Event Handled"
};
};
Check the answer here on how to set up an Async Invoke to the Lambda function. This will return 200 immediately to the client, but the Lambda will process on it's own asynchronously.
https://stackoverflow.com/a/40982649/5679071

Api Gateway: AWS Subdomain for Lambda Integration

I'm attempting to integrate my lambda function, which must run async because it takes too long, with API gateway. I believe I must, instead of choosing the "Lambda" integration type, choose "AWS Service" and specify Lambda. (e.g. this and this seem to imply that.)
However, I get the message "AWS ARN for integration must contain path or action" when I attempt to set the AWS Subdomain to the ARN of my Lambda function. If I set the subdomain to just the name of my Lambda function, when attempting to deploy I get "AWS ARN for integration contains invalid path".
What is the proper AWS Subdomain for this type of integration?
Note that I could also take the advice of this post and set up a Kinesis stream, but that seems excessive for my simple use case. If that's the proper way to resolve my problem, happy to try that.
Edit: Included screen shot
Edit: Please see comment below for an incomplete resolution.
So it's pretty annoying to set up, but here are two ways:
Set up a regular Lambda integration and then add the InvocationType header described here http://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html. The value should be 'Event'.
This is annoying because the console won't let you add headers when you have a Lambda function as the Integration type. You'll have to use the SDK or the CLI, or use Swagger where you can add the header easily.
Set the whole thing up as an AWS integration in the console (this is what you're doing in the question), just so you can set the InvocationType header in the console
Leave subdomain blank
"Use path override" and set it to /2015-03-31/functions/<FunctionARN>/invocations where <FunctionARN> is the full ARN of your lambda function
HTTP method is POST
Add a static header X-Amz-Invocation-Type with value 'Event'
http://docs.aws.amazon.com/lambda/latest/dg/API_Invoke.html
The other option, which I did, was to still use the Lambda configuration and use two lambdas. The first (code below) runs in under a second and returns immediately. But, what it really does is fire off a second lambda (your primary one) that can be long running (up to the 15 minute limit) as an Event. I found this more straightforward.
/**
* Note: Step Functions, which are called out in many answers online, do NOT actually work in this case. The reason
* being that if you use Sequential or even Parallel steps they both require everything to complete before a response
* is sent. That means that this one will execute quickly but Step Functions will still wait on the other one to
* complete, thus defeating the purpose.
*
* #param {Object} event The Event from Lambda
*/
exports.handler = async (event) => {
let params = {
FunctionName: "<YOUR FUNCTION NAME OR ARN>",
InvocationType: "Event", // <--- This is KEY as it tells Lambda to start execution but immediately return / not wait.
Payload: JSON.stringify( event )
};
// we have to wait for it to at least be submitted. Otherwise Lambda runs too fast and will return before
// the Lambda can be submitted to the backend queue for execution
await new Promise((resolve, reject) => {
Lambda.invoke(params, function(err, data) {
if (err) {
reject(err, err.stack);
}
else {
resolve('Lambda invoked: '+data) ;
}
});
});
// Always return 200 not matter what
return {
statusCode : 200,
body: "Event Handled"
};
};

Invoke multiple aws lambda functions

How can we invoke multiple AWS Lambda functions one after the other ?For example if an AWS Lambda chain consists of 8 separate lambda functions and each simulate a 1 sec processing event and then invoke the next function in the chain.
I wouldn't recommend using direct invoke to launch your functions. Instead you should consider creating an SNS Topic and subscribing your Lambda functions to this topic. Once a message is published to your topic, all functions will fire at the same time. This solution is also easily scalable.
See more information at official documentation Invoking Lambda functions using Amazon SNS notifications
With python:
from boto3 import client as botoClient
import json
lambdas = botoClient("lambda")
def lambda_handler(event, context):
response1 = lambdas.invoke(FunctionName="myLambda1", InvocationType="RequestResponse", Payload=json.dumps(event));
response2 = lambdas.invoke(FunctionName="myLambda2", InvocationType="RequestResponse", Payload=json.dumps(event));
A simple way to do it is to use the AWS sdk to invoke the lambda function.
The solution would look different depending on what sdk you use. If using the Node sdk I would suggest promisifying the sdk with a Promise library like for example Bluebird.
The code would look something like:
const Promise = require('bluebird');
const AWS = require('aws-sdk');
const lambda = Promise.promisifyAll(new AWS.Lambda({ apiVersion: '2015-03-31' }));
lambda.invokeAsync({FunctionName: 'FirstLambdaFunction'})
.then(() => {
// handle successful response from first lambda
return lambda.invokeAsync({FunctionName: 'SecondLambdaFunction'});
})
.then(() => lambda.invokeAsync({FunctionName: 'ThirdLambdaFunction'}))
.catch(err => {
// Handle error response
);
The reason why I like this approach is that you own the context of all the lambdas and can decide to do whatever you like with the different responses.
Just call the next Lambda function at the end of each function?
Use http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Lambda_20141111.html#invokeAsync-property if you are using Node.js/JavaScript.